Guardrail

gllm-guardrail | Tutorial: Guardrail | Use Case: Simple Guardrail | API Reference

Overview

Guardrails are a critical security and safety component that provides content filtering and safety checks for both user inputs and AI-generated responses. They act as a safety net, ensuring that your AI application adheres to safety standards, avoids harmful content, and remains on-topic.

The guardrail system is designed to be modular and extensible, allowing you to orchestrate multiple safety engines—from simple keyword matching to advanced LLM-based analysis using frameworks like NVIDIA's NeMo Guardrails.

What are Guardrails?

Guardrails screen content at two critical points:

  1. User input (queries, prompts, context) before it reaches an LLM

  2. Model output (responses) before it reaches end users

In gllm-guardrail, moderation is implemented via guardrail engines that are orchestrated by a GuardrailManager.

Guardrail engines (overview)

Engines implement a simple async interface: check_input() and check_output(). Each engine has a guardrail_mode that decides what it checks:

  • GuardrailMode.INPUT_ONLY: check only input (default)

  • GuardrailMode.OUTPUT_ONLY: check only output

  • GuardrailMode.BOTH: check both input and output

  • GuardrailMode.DISABLED: skip the engine entirely

This library ships with two engines:

  • PhraseMatcherEngine (rule-based): lightweight banned phrase detection.

  • NemoGuardrailEngine (LLM-based): NVIDIA NeMo Guardrails integration for more complex guardrails.

Supported engines

Currently, the component supports these guardrail engines:

Prerequisites

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

Quickstart

1) Input-only moderation (string)

2) Output-only moderation

3) Check both input and output in one call

How to pass input and/or output to the manager

GuardrailManager.check_content() accepts:

  1. str: treated as input-only (GuardrailInput(input=<str>, output=None))

  2. GuardrailInput: explicit input and/or output

Examples:

Using GuardrailManager (single and multiple engines)

Execution model (important)

When you configure multiple engines, they run in the order you provide:

  1. Engines with GuardrailMode.DISABLED are skipped.

  2. For each engine:

    1. If the engine checks input (INPUT_ONLY or BOTH) and GuardrailInput.input is provided, it runs check_input().

    2. If the engine checks output (OUTPUT_ONLY or BOTH) and GuardrailInput.output is provided, it runs check_output().

  3. The manager is fail-fast: it returns immediately when the first engine reports unsafe content.

Multiple engines example

GuardrailManager is conservative by default:

  1. Empty input and empty output is treated as safe (empty_content_safe=True).

  2. If an engine raises an exception, the manager marks the content unsafe (error_conservative=True).

Using an engine without the manager (standalone)

All engines are async and can be used directly:

Guardrail schemas

GuardrailInput

GuardrailInput is the input schema for guardrail checks:

  • input: str | None: input content (query/prompt/context)

  • output: str | None: output content (model response/generated text)

GuardrailResult

GuardrailResult is the output schema returned by engines and manager:

  • is_safe: bool: whether the content passed the checks

  • reason: str | None: why it was blocked (only set when unsafe)

  • filtered_content: str | None: cleaned content if an engine can provide it

PhraseMatcherEngine returns filtered_content=None (it detects and blocks, it does not rewrite content).

API Reference

Last updated