Named-Entity Recognition (NER)
Overview
Named Entity Recognition (NER) is a text-processing method that reads unstructured text and identifies specific entities within it.
While Regular Expressions (Regex) are excellent for detecting patterns like IDs or Phone Numbers, they often fail at detecting unstructured entities. NER fills this gap.
Key capabilities:
Unstructured Detection: Accurately detects People (PERSON), Organizations (ORGANIZATION), and Locations (LOCATION).
Context Aware: Distinguishes between "Apple" the fruit and "Apple" the company based on sentence context.
API Categories
The GDP Labs NER service exposes two categories of APIs:
1. /analyze — Detect Entities
Extracts entities with their type, confidence score, and character offsets.
Use case: You want to know what is in the text to tag it, filter it, or inspect it.
2. /anonymize — Mask Entities
Returns the text with PII already masked/replaced.
Use case: You want a clean string immediately for storage or LLM consumption without custom logic.
1. Base URL & Authentication
Staging Base URL https://stag-api-gdplabs-ner-api.obrol.id
Required Headers
x-api-key: <YOUR_API_KEY>Content-Type: application/json
2. Option A – Direct HTTP API
Use this option if you are working in a non-Python environment (Node.js, Go, etc.) or want direct control.
2.1 POST /analyze — Detect Entities
Endpoint: POST /analyze
Request:
Response:
cURL Example:
2.2 POST /anonymize — Mask PII
Endpoint: POST /anonymize
Request:
Response:
cURL Example:
3. Option B – Using gllm-privacy SDK
gllm-privacy SDKIf you are using Python, the gllm-privacy library provides a seamless wrapper around the NER service.
3.1 Installation
3.2 Configuration & Initialization
To use the remote service, you must configure the GDPLabsNerApiRemoteRecognizer.
3.3 Run Analysis (Detection)
Use this step to inspect what the NER model detects before taking action.
3.4 Run Anonymization
Use the TextAnonymizer to mask the entities detected by the remote service.
4. When to Use What?
Feature
HTTP API
SDK (gllm-privacy)
Environment
Non-Python (Node, Go, etc.)
Python Applications
Detection
Returns raw JSON of entities.
Returns RecognizerResult objects.
Anonymization
Server-side masking (***).
Client-side masking (Placeholders or Fake Data).
Flexibility
Fixed output format.
Highly customizable (add Regex, custom logic, etc).
Last updated