Guardrail

How to enable GL Chat Guardrail and configure it as needed

Guardrail Modes

The system supports four different operational modes:

Mode
Description
Use Case

disabled

No guardrail checks performed

Development/testing environments

input_only

Only user input is checked

Basic safety with minimal overhead

output_only

Only AI responses are checked

When input filtering is handled elsewhere

both

Both input and output are checked

Maximum safety for production environments

Configuration

Basic Configuration

Guardrails can be configured through the admin interface using preset configurations. The main configuration fields are:

Core Settings

  • enable_guardrails (boolean): Master switch to enable/disable the entire guardrail system

  • guardrail_mode (string): Controls which content is checked (disabled, input_only, output_only, both)

Topic Safety Configuration

  • topic_safety_mode (string): Defines how topic filtering works

    • allowlist: Only specified topics are allowed

    • denylist: Only specified topics are blocked

    • hybrid: Combines both allowlist and denylist approaches

    • disabled: No topic-based filtering

  • allowed_topics (JSON string): List of topics that are explicitly allowed

banned_phrases (JSON string): List of specific phrases that should be blocked

Database Configuration

The guardrail configuration is stored in the config_fields table and can be managed through the admin interface:

Advanced Configuration

Custom Colang Configuration

For advanced users, the system supports custom Colang configurations that define the exact business logic for content filtering:

Security Features

The guardrail system includes several security measures:

  1. Circuit Breaker Pattern: Prevents system failures from causing widespread issues

  2. Rate Limiting: Prevents abuse through excessive requests (60 requests/minute by default)

  3. Input Sanitization: Removes malicious characters and limits content length

  4. Error Handling: Conservative approach - marks content as unsafe on errors

Setting Up Guardrails

Step 1: Enable Guardrails

  1. Navigate to your pipeline preset configuration in the admin interface

  2. Set enable_guardrails to true

  3. Choose your desired guardrail_mode

Step 2: Configure Topic Safety

  1. Set topic_safety_mode to your preferred approach:

    • Hybrid (recommended): Combines allowlist and denylist for maximum flexibility

    • Allowlist: Only allow specific business topics

    • Denylist: Block specific off-topic areas

    • Disabled: No topic-based filtering

  2. Configure your topic lists:

Step 3: Set Up API Keys

Ensure your OpenAI API key is configured for the guardrail system:

Core Safety Categories

The guardrail system includes comprehensive protection against:

1. Child Safety and Protection

  • Child exploitation content

  • Age-inappropriate content access

  • COPPA/FERPA violations

2. Violence and Physical Harm

  • Explicit violence promotion

  • Weapons development instructions

  • Self-harm and suicide content

  • Threats and intimidation

3. Hate Speech and Discrimination

  • Protected characteristic targeting

  • Harassment and bullying

  • Extremist ideologies

4. Privacy and Personal Information

  • Personal data extraction

  • Surveillance and stalking

  • Identity theft facilitation

5. Professional Services and Liability

  • Unauthorized medical advice

  • Legal advice and representation

  • Financial/investment advice

6. Misinformation and Deception

  • Factual misinformation

  • Election interference

  • Deepfakes and synthetic media

7. Regulated Industries and Compliance

  • HIPAA-protected medical information

  • Financial services/SOX compliance

  • Educational/FERPA protected content

  • Copyright infringement

  • Trademark and brand misuse

9. System Manipulation and Security

  • Jailbreaking and prompt injection

  • Model extraction and reverse engineering

Best Practices

Configuration Recommendations

  1. Start with Hybrid Mode: Use topic_safety_mode: hybrid for maximum flexibility

  2. Define Clear Business Topics: Be specific about what your chatbot should handle

  3. Regular Testing: Use the evaluation tools to test your configuration

  4. Monitor Performance: Watch for false positives that block legitimate queries

Security Considerations

  1. API Key Management: Store API keys securely and rotate them regularly

  2. Rate Limiting: Adjust rate limits based on your expected traffic

  3. Error Handling: Monitor guardrail failures and adjust configuration as needed

  4. Regular Updates: Keep the guardrail rules updated with new threats and requirements

Performance Optimization

  1. SpaCy Integration: Enable SpaCy for faster phrase matching (optional)

  2. Caching: Use appropriate caching strategies for repeated queries

  3. Concurrent Processing: Adjust concurrency limits based on your system capacity

Troubleshooting

Common Issues

Guardrails Not Working

  • Check that enable_guardrails is set to true

  • Verify your OpenAI API key is configured

  • Ensure the guardrail mode is not set to disabled

False Positives

  • Review your allowed topics list

  • Check for overly restrictive banned phrases

  • Consider adjusting the topic safety mode

Performance Issues

  • Monitor rate limiting settings

  • Check for circuit breaker activations

  • Review input sanitization settings

Configuration Errors

  • Validate JSON format for topic lists

  • Check Colang syntax for custom configurations

  • Verify database migration status

Migration and Updates

Configuration Migration

For existing installations:

  1. Default configuration will be set to input_only mode for backward compatibility

  2. Review and update your topic lists based on your business requirements

  3. Test the new configuration in a staging environment before deploying to production

Evaluation and Monitoring

Built-in Evaluation Tools

The system includes comprehensive evaluation tools:

  • Local Evaluation: Test against predefined datasets

  • Langfuse Integration: Track evaluation results and performance

  • Custom Metrics: Add your own evaluation criteria

Monitoring Recommendations

  1. Track Block Rates: Monitor how often content is blocked

  2. False Positive Analysis: Review blocked content for legitimate queries

  3. Performance Metrics: Monitor response times and system load

  4. Security Alerts: Set up alerts for suspicious activity patterns

API Reference

GuardrailManager Methods

Support and Resources

Documentation

Community

Last updated