Agent Content Guardrails
Implement modular content filtering and safety checks for AI agent interactions. This guide covers rule-based phrase matching and advanced LLM-based content safety engines, showing how to prevent harmful content in both user inputs and AI outputs.
Success
When to use this guide: You need content safety controls for AI agents, want to block harmful prompts or filter inappropriate AI responses, or require local content filtering options for security-conscious deployments.
Who benefits: Security engineers, compliance teams, and developers building production AI applications that handle sensitive or regulated content.
Guardrails integrate seamlessly with agent execution—configure once and they work locally (via agent.run()) or remotely (via agent.deploy() + agent.run()). The SDK automatically handles middleware injection and serialization.
Overview
Agent Content Guardrails provide modular content filtering and safety checks for AI agent interactions. They help prevent harmful content in both user inputs and AI outputs, making them essential for security-conscious organizations and developers who need content safety controls.
Guardrails work by checking content against predefined safety rules before and after AI model interactions. When unsafe content is detected, execution is halted and a warning message is returned.
Key Features
Multiple Engine Types: Rule-based (PhraseMatcherEngine) and LLM-based (NemoGuardrailEngine) filtering
Flexible Configuration: Check inputs only, outputs only, or both
Fail-Fast Behavior: Stops on first safety violation for immediate response
Agent Integration: Seamless integration with existing agent workflows
Optional Dependencies: Works without requiring additional packages for basic usage
Installation
Guardrails are included as an optional dependency. Install with:
Prerequisites: To use guardrails features, install the guardrails extra:
Both installation methods install the required gllm-guardrail package for advanced LLM-based filtering. For basic phrase matching only, guardrails work without additional dependencies when using the SDK.
Quick Start
Basic Phrase Matching
Advanced LLM-Based Filtering
Engine Types
PhraseMatcherEngine (Rule-Based)
Best for simple, predictable content filtering based on exact phrase matches.
Configuration Options:
config: OptionalBaseGuardrailEngineConfigobject. If not provided, defaults toGuardrailMode.INPUT_OUTPUTguardrail_mode: Enum value -GuardrailMode.INPUT_ONLY,GuardrailMode.OUTPUT_ONLY, orGuardrailMode.INPUT_OUTPUTbanned_phrases: List of phrases to block (required)
NemoGuardrailEngine (LLM-Based)
Advanced filtering using AI models for context-aware content safety analysis.
Direct Guardrail Usage
When to use: Validate content before sending it to agents or perform standalone content filtering outside of agent execution.
You can use guardrails independently of agents for content checking. This is useful for validating content before sending it to agents or for standalone content filtering.
Important Notes:
When using guardrails with
agent.run(), async handling is automaticFor direct usage, you must use
awaitsincecheck_content()is an async methodUse
GuardrailInputwhen you want to check both user input and AI output in a single callThe
filtered_contentfield may contain sanitized content if the engine provides it
Agent Integration
When to use: Integrate guardrails into agent workflows for automatic content filtering during agent execution.
Local Execution
When running agents locally, guardrails are enforced through middleware injection:
Remote Execution
For deployed agents, guardrails are serialized and enforced by the backend:
Configuration Patterns
When to use: Combine multiple engines, configure different checking modes, or customize guardrail behavior for specific use cases.
Multiple Engines
Combine multiple guardrail engines for comprehensive protection:
Input-Only vs Output-Only
Configure different checking modes:
Checking Both Input and Output Together
Use GuardrailInput to check both user input and AI output in a single call:
Disabling Guardrails
You can disable guardrails for a specific engine:
Note: Disabled mode is useful for temporarily disabling guardrails during development or testing.
Error Handling
Guardrail Violations
When unsafe content is detected, execution halts and returns a warning message. Internally, a GuardrailViolationError exception is raised, which is caught and converted to a user-friendly warning message in the response.
With Agent Integration:
With Direct Usage:
Handling Exceptions
When using agents, violations are automatically converted to warning messages:
Key Points:
Guardrail violations are caught internally and converted to warning strings
Users typically see warning messages like
"⚠️ Guardrail violation: [reason]"in responsesThe underlying
GuardrailViolationErrorexception contains aGuardrailResultwith detailsFor direct usage, you can catch
GuardrailViolationErrorexplicitly if needed
Best Practices
Performance Considerations
PhraseMatcherEngine: Fast, low latency (<1ms) - ideal for high-throughput scenarios
NemoGuardrailEngine: Higher latency (~100-500ms depending on model) - use for advanced filtering when needed
Fail-fast behavior: Multiple engines stop on first violation, reducing unnecessary processing
Async/await requirements:
Direct usage (
manager.check_content()) requiresawaitsince it's asyncAgent integration (
agent.run()) handles async automatically
Multiple engines: Engines run sequentially until first violation, so total latency is sum of engines until violation
Performance tip: Place faster engines (PhraseMatcherEngine) first in the list to catch violations quickly
Configuration Tips
Start Simple: Begin with PhraseMatcherEngine for basic filtering
Layer Protection: Use multiple engines for comprehensive coverage
Test Thoroughly: Validate configurations with various inputs
Monitor Performance: Measure latency impact on agent response times
Security Recommendations
Troubleshooting
Common Issues
"Guardrails module not found"
Install optional dependencies:
pip install glaip-sdk[guardrails]
"NemoGuardrailEngine not available"
Ensure
gllm-guardrailpackage is installedCheck that
OPENAI_API_KEYor required credentials are set
"Agent execution hangs"
Check guardrail configuration for overly broad rules
Verify model endpoints are accessible
Review network connectivity for LLM-based engines
"False positives in phrase matching"
Review banned phrases for overly generic terms
Consider case sensitivity settings
Test with various input variations
Debugging
Enable detailed logging to troubleshoot issues:
Remote vs Local Behavior
Local execution: Immediate blocking with detailed error messages
Remote execution: Backend-enforced with standardized warning format
API Reference
Core Classes
GuardrailManager: Orchestrates multiple guardrail enginesPhraseMatcherEngine: Rule-based phrase filteringNemoGuardrailEngine: Advanced LLM-based content safetyGuardrailMiddleware: Integrates guardrails into agent execution
Configuration Schemas
GuardrailMode: Enum with valuesINPUT_ONLY,OUTPUT_ONLY,INPUT_OUTPUT,DISABLEDTopicSafetyMode: Enum with valuesALLOWLIST,DENYLISTBaseGuardrailEngineConfig: Common engine configuration class withguardrail_modeparameterGuardrailInput: Input schema for checking both input and output together (containsinputandoutputfields)
Result Objects
GuardrailResult: Contains:is_safe: Boolean indicating if content passed all safety checksreason: String explanation when content is blocked (None if safe)filtered_content: Optional cleaned/sanitized content if the engine provides it (None if not available)
Input Schemas
GuardrailInput: Schema for checking both input and output together:input: Optional string containing user input contentoutput: Optional string containing AI output content
Related Documentation
Security & privacy — apply PII masking, secure memory, and manage credentials responsibly.
Agents guide — design, update, and monitor agents with the SDK and CLI (REST is reference-only).
Additional Resources
GL SDK Documentation — Core SDK reference
NeMo Engine Guide — Advanced LLM-based guardrail configuration
Contact enterprise support for advanced configuration assistance
Last updated
Was this helpful?