Phrase Matcher Engine

gllm-guardrail | Tutorial: Guardrail | Engine: Phrase matcher | API Reference

What it does

PhraseMatcherEngine is a lightweight, rule-based engine that blocks content when it contains any configured banned phrases.

Key behavior:

The same banned phrases list is used for both input and output checks.
Matching uses spaCy PhraseMatcher if enabled and available, otherwise it falls back to a regex-based matcher.

Cases it can handle

1) Banned phrases

Examples of phrases you might ban:

Disallowed instructions (e.g., "make a bomb")
Sensitive terms or internal keywords (e.g., "secret password")

2) Detecting simple patterns (prefix-style)

This engine is phrase-based, but you can still catch many “pattern-like” strings by banning a distinctive prefix.

Example: if you want to filter API keys that look like sk-xxxxxxxx, you can add "sk-" to banned_phrases.

This is not a full regex/pattern engine. It works best for distinctive, stable markers (prefixes, known tokens, exact phrases).

Use default config

By default:

Engine mode is GuardrailMode.INPUT_ONLY (checks input only).
If spaCy is not installed, the engine uses regex matching automatically.

from gllm_guardrail import PhraseMatcherEngine

engine = PhraseMatcherEngine()

Use custom config

Custom banned phrases

from gllm_guardrail import PhraseMatcherEngine

engine = PhraseMatcherEngine(banned_phrases=["sk-", "internal-only", "do not share"])

Check both input and output

from gllm_guardrail import BaseGuardrailEngineConfig, GuardrailMode, PhraseMatcherEngine

config = BaseGuardrailEngineConfig(guardrail_mode=GuardrailMode.BOTH)
engine = PhraseMatcherEngine(config=config, banned_phrases=["sk-"])

Using spaCy PhraseMatcher

1) Install the optional dependency

pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-guardrail[spacy]"

2) Download a spaCy model

python -m spacy download en_core_web_sm

3) Enable spaCy mode in the engine

from gllm_guardrail import PhraseMatcherEngine

engine = PhraseMatcherEngine(
    banned_phrases=["sk-", "secret password"],
    use_spacy=True,
    model_name="en_core_web_sm",
)

If spaCy fails to initialize (missing model, incompatible environment, etc.), the engine automatically falls back to regex matching.

PreviousGuardrail NextNeMo Engine

Last updated 1 month ago

Was this helpful?

hashtagWhat it does

hashtagCases it can handle

hashtag1) Banned phrases

hashtag2) Detecting simple patterns (prefix-style)

hashtagUse default config

hashtagUse custom config

hashtagCustom banned phrases

hashtagCheck both input and output

hashtagUsing spaCy PhraseMatcher

hashtag1) Install the optional dependency

hashtag2) Download a spaCy model

hashtag3) Enable spaCy mode in the engine

What it does

Cases it can handle

1) Banned phrases

2) Detecting simple patterns (prefix-style)

Use default config

Use custom config

Custom banned phrases

Check both input and output

Using spaCy PhraseMatcher

1) Install the optional dependency

2) Download a spaCy model

3) Enable spaCy mode in the engine