Named-Entity Recognition (NER)

Overview

Named Entity Recognition (NER) is a text-processing method that reads unstructured text and identifies specific entities within it.

While Regular Expressions (Regex) are excellent for detecting patterns like IDs or Phone Numbers, they often fail at detecting unstructured entities. NER fills this gap.

Key capabilities:

  • Unstructured Detection: Accurately detects People (PERSON), Organizations (ORGANIZATION), and Locations (LOCATION).

  • Context Aware: Distinguishes between "Apple" the fruit and "Apple" the company based on sentence context.


API Categories

The GDP Labs NER service exposes two categories of APIs:

1. /analyze — Detect Entities

Extracts entities with their type, confidence score, and character offsets.

  • Use case: You want to know what is in the text to tag it, filter it, or inspect it.

2. /anonymize — Mask Entities

Returns the text with PII already masked/replaced.

  • Use case: You want a clean string immediately for storage or LLM consumption without custom logic.


1. Base URL & Authentication

Staging Base URL https://stag-api-gdplabs-ner-api.obrol.id

Required Headers

  • x-api-key: <YOUR_API_KEY>

  • Content-Type: application/json

Contact your manager or the infrastructure team to obtain a valid API Key.


2. Option A – Direct HTTP API

Use this option if you are working in a non-Python environment (Node.js, Go, etc.) or want direct control.

2.1 POST /analyze — Detect Entities

Endpoint: POST /analyze

Request:

Response:

cURL Example:

2.2 POST /anonymize — Mask PII

Endpoint: POST /anonymize

Request:

Response:

cURL Example:


3. Option B – Using gllm-privacy SDK

If you are using Python, the gllm-privacy library provides a seamless wrapper around the NER service.

Note: The SDK uses the /analyze endpoint under the hood to detect entities, but performs the actual Anonymization locally using its own TextAnonymizer. This gives you more control over the masking strategy (e.g., placeholders vs fake data).

3.1 Installation

3.2 Configuration & Initialization

To use the remote service, you must configure the GDPLabsNerApiRemoteRecognizer.

3.3 Run Analysis (Detection)

Use this step to inspect what the NER model detects before taking action.

3.4 Run Anonymization

Use the TextAnonymizer to mask the entities detected by the remote service.


4. When to Use What?

Feature

HTTP API

SDK (gllm-privacy)

Environment

Non-Python (Node, Go, etc.)

Python Applications

Detection

Returns raw JSON of entities.

Returns RecognizerResult objects.

Anonymization

Server-side masking (***).

Client-side masking (Placeholders or Fake Data).

Flexibility

Fixed output format.

Highly customizable (add Regex, custom logic, etc).

Last updated