Guardrail

How to enable GL Chat Guardrail and configure it as needed

Guardrail Modes

The system supports four different operational modes:

Mode

Description

Use Case

disabled

No guardrail checks performed

Development/testing environments

input_only

Only user input is checked

Basic safety with minimal overhead

output_only

Only AI responses are checked

When input filtering is handled elsewhere

both

Both input and output are checked

Maximum safety for production environments

Configuration

Basic Configuration

Guardrails can be configured through the admin interface using preset configurations. The main configuration fields are:

Core Settings

enable_guardrails (boolean): Master switch to enable/disable the entire guardrail system
guardrail_mode (string): Controls which content is checked (disabled, input_only, output_only, both)

Topic Safety Configuration

topic_safety_mode (string): Defines how topic filtering works
- allowlist: Only specified topics are allowed
- denylist: Only specified topics are blocked
- hybrid: Combines both allowlist and denylist approaches
- disabled: No topic-based filtering
allowed_topics (JSON string): List of topics that are explicitly allowed

banned_phrases (JSON string): List of specific phrases that should be blocked

Database Configuration

The guardrail configuration is stored in the config_fields table and can be managed through the admin interface:

-- Example configuration field
INSERT INTO config_fields (
    name, type, constraints, default_value, ui_type, level
) VALUES (
    'guardrail_mode', 
    'str', 
    '["disabled", "input_only", "output_only", "both"]', 
    'input_only', 
    'dropdown', 
    'PRESET'
);

Advanced Configuration

Custom Colang Configuration

For advanced users, the system supports custom Colang configurations that define the exact business logic for content filtering:

# Example: Block competitor discussions
define user ask about competitor products
  "How does your product compare to [competitor]?"
  "Should I choose [competitor] instead?"
  "What are the advantages of [competitor]?"

define bot refuse competitor products
  "I can only provide information about our company's products and services."

define flow competitor products
  user ask about competitor products
  bot refuse competitor products

Security Features

The guardrail system includes several security measures:

Circuit Breaker Pattern: Prevents system failures from causing widespread issues
Rate Limiting: Prevents abuse through excessive requests (60 requests/minute by default)
Input Sanitization: Removes malicious characters and limits content length
Error Handling: Conservative approach - marks content as unsafe on errors

Setting Up Guardrails

Step 1: Enable Guardrails

Navigate to your pipeline preset configuration in the admin interface
Set enable_guardrails to true
Choose your desired guardrail_mode

Step 2: Configure Topic Safety

Set topic_safety_mode to your preferred approach:
- Hybrid (recommended): Combines allowlist and denylist for maximum flexibility
- Allowlist: Only allow specific business topics
- Denylist: Block specific off-topic areas
- Disabled: No topic-based filtering
Configure your topic lists:

// Example allowed topics
[
  "company products and services",
  "technical support", 
  "product documentation",
  "company policies",
  "general information about the company"
]

// Example denied topics  
[
  "competitor products",
  "political discussions",
  "weather information",
  "sports discussions",
  "entertainment topics"
]

Step 3: Set Up API Keys

Ensure your OpenAI API key is configured for the guardrail system:

export OPENAI_API_KEY="your-api-key-here"

Core Safety Categories

The guardrail system includes comprehensive protection against:

1. Child Safety and Protection

Child exploitation content
Age-inappropriate content access
COPPA/FERPA violations

2. Violence and Physical Harm

Explicit violence promotion
Weapons development instructions
Self-harm and suicide content
Threats and intimidation

3. Hate Speech and Discrimination

Protected characteristic targeting
Harassment and bullying
Extremist ideologies

4. Privacy and Personal Information

Personal data extraction
Surveillance and stalking
Identity theft facilitation

5. Professional Services and Liability

Unauthorized medical advice
Legal advice and representation
Financial/investment advice

6. Misinformation and Deception

Factual misinformation
Election interference
Deepfakes and synthetic media

7. Regulated Industries and Compliance

HIPAA-protected medical information
Financial services/SOX compliance
Educational/FERPA protected content

8. Intellectual Property and Copyright

Copyright infringement
Trademark and brand misuse

9. System Manipulation and Security

Jailbreaking and prompt injection
Model extraction and reverse engineering

Best Practices

Configuration Recommendations

Start with Hybrid Mode: Use topic_safety_mode: hybrid for maximum flexibility
Define Clear Business Topics: Be specific about what your chatbot should handle
Regular Testing: Use the evaluation tools to test your configuration
Monitor Performance: Watch for false positives that block legitimate queries

Security Considerations

API Key Management: Store API keys securely and rotate them regularly
Rate Limiting: Adjust rate limits based on your expected traffic
Error Handling: Monitor guardrail failures and adjust configuration as needed
Regular Updates: Keep the guardrail rules updated with new threats and requirements

Performance Optimization

SpaCy Integration: Enable SpaCy for faster phrase matching (optional)
Caching: Use appropriate caching strategies for repeated queries
Concurrent Processing: Adjust concurrency limits based on your system capacity

Troubleshooting

Common Issues

Guardrails Not Working

Check that enable_guardrails is set to true
Verify your OpenAI API key is configured
Ensure the guardrail mode is not set to disabled

False Positives

Review your allowed topics list
Check for overly restrictive banned phrases
Consider adjusting the topic safety mode

Performance Issues

Monitor rate limiting settings
Check for circuit breaker activations
Review input sanitization settings

Configuration Errors

Validate JSON format for topic lists
Check Colang syntax for custom configurations
Verify database migration status

Migration and Updates

Configuration Migration

For existing installations:

Default configuration will be set to input_only mode for backward compatibility
Review and update your topic lists based on your business requirements
Test the new configuration in a staging environment before deploying to production

Evaluation and Monitoring

Built-in Evaluation Tools

The system includes comprehensive evaluation tools:

Local Evaluation: Test against predefined datasets
Langfuse Integration: Track evaluation results and performance
Custom Metrics: Add your own evaluation criteria

Monitoring Recommendations

Track Block Rates: Monitor how often content is blocked
False Positive Analysis: Review blocked content for legitimate queries
Performance Metrics: Monitor response times and system load
Security Alerts: Set up alerts for suspicious activity patterns

API Reference

GuardrailManager Methods

# Check user input
result = await guardrail_manager.check_content(user_query)

# Check AI output  
result = await guardrail_manager.check_system_output(ai_response)

# Reconfigure guardrails
guardrail_manager.reconfigure(config_dict, colang_config)

Support and Resources

Documentation

Community

NeMo Guardrails GitHub

PreviousAgent Approval Log NextResources

Last updated 4 months ago

hashtagGuardrail Modes

hashtagConfiguration

hashtagBasic Configuration

hashtagDatabase Configuration

hashtagAdvanced Configuration

hashtagSetting Up Guardrails

hashtagStep 1: Enable Guardrails

hashtagStep 2: Configure Topic Safety

hashtagStep 3: Set Up API Keys

hashtagCore Safety Categories

hashtag1. Child Safety and Protection

hashtag2. Violence and Physical Harm

hashtag3. Hate Speech and Discrimination

hashtag4. Privacy and Personal Information

hashtag5. Professional Services and Liability

hashtag6. Misinformation and Deception

hashtag7. Regulated Industries and Compliance

hashtag8. Intellectual Property and Copyright

hashtag9. System Manipulation and Security

hashtagBest Practices

hashtagConfiguration Recommendations

hashtagSecurity Considerations

hashtagPerformance Optimization

hashtagTroubleshooting

hashtagCommon Issues

hashtagMigration and Updates

hashtagConfiguration Migration

hashtagEvaluation and Monitoring

hashtagBuilt-in Evaluation Tools

hashtagMonitoring Recommendations

hashtagAPI Reference

hashtagGuardrailManager Methods

hashtagSupport and Resources

hashtagDocumentation

hashtagCommunity