Guardrail
gllm-guardrail | Tutorial: Guardrail | Use Case: Simple Guardrail | API Reference
Overview
Guardrails are a critical security and safety component that provides content filtering and safety checks for both user inputs and AI-generated responses. They act as a safety net, ensuring that your AI application adheres to safety standards, avoids harmful content, and remains on-topic.
The guardrail system is designed to be modular and extensible, allowing you to orchestrate multiple safety engines—from simple keyword matching to advanced LLM-based analysis using frameworks like NVIDIA's NeMo Guardrails.
What are Guardrails?
Guardrails screen content at two critical points:
User input (queries, prompts, context) before it reaches an LLM
Model output (responses) before it reaches end users
In gllm-guardrail, moderation is implemented via guardrail engines that are orchestrated by a GuardrailManager.
Guardrail engines (overview)
Engines implement a simple async interface: check_input() and check_output(). Each engine has a guardrail_mode that decides what it checks:
GuardrailMode.INPUT_ONLY: check only input (default)GuardrailMode.OUTPUT_ONLY: check only outputGuardrailMode.BOTH: check both input and outputGuardrailMode.DISABLED: skip the engine entirely
This library ships with two engines:
PhraseMatcherEngine(rule-based): lightweight banned phrase detection.Deep dive: Phrase Matcher Engine
NemoGuardrailEngine(LLM-based): NVIDIA NeMo Guardrails integration for more complex guardrails.Deep dive: NeMo Engine
Supported engines
Currently, the component supports these guardrail engines:
Prerequisites
Installation
Quickstart
1) Input-only moderation (string)
2) Output-only moderation
3) Check both input and output in one call
How to pass input and/or output to the manager
GuardrailManager.check_content() accepts:
str: treated as input-only (GuardrailInput(input=<str>, output=None))GuardrailInput: explicitinputand/oroutput
Examples:
Using GuardrailManager (single and multiple engines)
GuardrailManager (single and multiple engines)Execution model (important)
When you configure multiple engines, they run in the order you provide:
Engines with
GuardrailMode.DISABLEDare skipped.For each engine:
If the engine checks input (
INPUT_ONLYorBOTH) andGuardrailInput.inputis provided, it runscheck_input().If the engine checks output (
OUTPUT_ONLYorBOTH) andGuardrailInput.outputis provided, it runscheck_output().
The manager is fail-fast: it returns immediately when the first engine reports unsafe content.
Multiple engines example
Using an engine without the manager (standalone)
All engines are async and can be used directly:
Guardrail schemas
GuardrailInput
GuardrailInputGuardrailInput is the input schema for guardrail checks:
input: str | None: input content (query/prompt/context)output: str | None: output content (model response/generated text)
GuardrailResult
GuardrailResultGuardrailResult is the output schema returned by engines and manager:
is_safe: bool: whether the content passed the checksreason: str | None: why it was blocked (only set when unsafe)filtered_content: str | None: cleaned content if an engine can provide it
API Reference
Last updated