Guardrail
How to enable GL Chat Guardrail and configure it as needed
Guardrail Modes
The system supports four different operational modes:
disabled
No guardrail checks performed
Development/testing environments
input_only
Only user input is checked
Basic safety with minimal overhead
output_only
Only AI responses are checked
When input filtering is handled elsewhere
both
Both input and output are checked
Maximum safety for production environments
Configuration
Basic Configuration
Guardrails can be configured through the admin interface using preset configurations. The main configuration fields are:
Core Settings
enable_guardrails(boolean): Master switch to enable/disable the entire guardrail systemguardrail_mode(string): Controls which content is checked (disabled,input_only,output_only,both)
Topic Safety Configuration
topic_safety_mode(string): Defines how topic filtering worksallowlist: Only specified topics are alloweddenylist: Only specified topics are blockedhybrid: Combines both allowlist and denylist approachesdisabled: No topic-based filtering
allowed_topics(JSON string): List of topics that are explicitly allowed
banned_phrases (JSON string): List of specific phrases that should be blocked
Database Configuration
The guardrail configuration is stored in the config_fields table and can be managed through the admin interface:
Advanced Configuration
Custom Colang Configuration
For advanced users, the system supports custom Colang configurations that define the exact business logic for content filtering:
Security Features
The guardrail system includes several security measures:
Circuit Breaker Pattern: Prevents system failures from causing widespread issues
Rate Limiting: Prevents abuse through excessive requests (60 requests/minute by default)
Input Sanitization: Removes malicious characters and limits content length
Error Handling: Conservative approach - marks content as unsafe on errors
Setting Up Guardrails
Step 1: Enable Guardrails
Navigate to your pipeline preset configuration in the admin interface
Set
enable_guardrailstotrueChoose your desired
guardrail_mode
Step 2: Configure Topic Safety
Set
topic_safety_modeto your preferred approach:Hybrid (recommended): Combines allowlist and denylist for maximum flexibility
Allowlist: Only allow specific business topics
Denylist: Block specific off-topic areas
Disabled: No topic-based filtering
Configure your topic lists:
Step 3: Set Up API Keys
Ensure your OpenAI API key is configured for the guardrail system:
Core Safety Categories
The guardrail system includes comprehensive protection against:
1. Child Safety and Protection
Child exploitation content
Age-inappropriate content access
COPPA/FERPA violations
2. Violence and Physical Harm
Explicit violence promotion
Weapons development instructions
Self-harm and suicide content
Threats and intimidation
3. Hate Speech and Discrimination
Protected characteristic targeting
Harassment and bullying
Extremist ideologies
4. Privacy and Personal Information
Personal data extraction
Surveillance and stalking
Identity theft facilitation
5. Professional Services and Liability
Unauthorized medical advice
Legal advice and representation
Financial/investment advice
6. Misinformation and Deception
Factual misinformation
Election interference
Deepfakes and synthetic media
7. Regulated Industries and Compliance
HIPAA-protected medical information
Financial services/SOX compliance
Educational/FERPA protected content
8. Intellectual Property and Copyright
Copyright infringement
Trademark and brand misuse
9. System Manipulation and Security
Jailbreaking and prompt injection
Model extraction and reverse engineering
Best Practices
Configuration Recommendations
Start with Hybrid Mode: Use
topic_safety_mode: hybridfor maximum flexibilityDefine Clear Business Topics: Be specific about what your chatbot should handle
Regular Testing: Use the evaluation tools to test your configuration
Monitor Performance: Watch for false positives that block legitimate queries
Security Considerations
API Key Management: Store API keys securely and rotate them regularly
Rate Limiting: Adjust rate limits based on your expected traffic
Error Handling: Monitor guardrail failures and adjust configuration as needed
Regular Updates: Keep the guardrail rules updated with new threats and requirements
Performance Optimization
SpaCy Integration: Enable SpaCy for faster phrase matching (optional)
Caching: Use appropriate caching strategies for repeated queries
Concurrent Processing: Adjust concurrency limits based on your system capacity
Troubleshooting
Common Issues
Guardrails Not Working
Check that
enable_guardrailsis set totrueVerify your OpenAI API key is configured
Ensure the guardrail mode is not set to
disabled
False Positives
Review your allowed topics list
Check for overly restrictive banned phrases
Consider adjusting the topic safety mode
Performance Issues
Monitor rate limiting settings
Check for circuit breaker activations
Review input sanitization settings
Configuration Errors
Validate JSON format for topic lists
Check Colang syntax for custom configurations
Verify database migration status
Migration and Updates
Configuration Migration
For existing installations:
Default configuration will be set to
input_onlymode for backward compatibilityReview and update your topic lists based on your business requirements
Test the new configuration in a staging environment before deploying to production
Evaluation and Monitoring
Built-in Evaluation Tools
The system includes comprehensive evaluation tools:
Local Evaluation: Test against predefined datasets
Langfuse Integration: Track evaluation results and performance
Custom Metrics: Add your own evaluation criteria
Monitoring Recommendations
Track Block Rates: Monitor how often content is blocked
False Positive Analysis: Review blocked content for legitimate queries
Performance Metrics: Monitor response times and system load
Security Alerts: Set up alerts for suspicious activity patterns
API Reference
GuardrailManager Methods
Support and Resources
Documentation
Community
Last updated