Advanced Configuration
This guide covers advanced configuration options for custom pipelines, including preset configurations, runtime configurations, and building tools.
Optional Module Attributes
Add these optional attributes to your pipeline.py:
from glchat_be.config.pipeline.my_custom_pipeline.preset_config import MyCustomPresetConfig
from glchat_be.config.pipeline.my_custom_pipeline.runtime_config import MyCustomRuntimeConfig
# Optional: Override pipeline name
name = "my-custom-pipeline-v2"
# Optional: Preset configuration class
preset_config_class = MyCustomPresetConfig
# Optional: Runtime configuration class
additional_config_class = MyCustomRuntimeConfig
# Required functions
async def build(...): pass
def build_initial_state(...): pass
# Optional: Tools function
async def build_tools(...): passPreset Configuration
Preset configuration defines pipeline-specific settings that are configured in the Admin Dashboard when building/creating the chatbot. These "preset" values serve as defaults for the pipeline and can optionally be overridden at runtime via RuntimeConfig.
When values are set: Configuration time (Admin Dashboard) When values are used: Pipeline build time (when constructing the pipeline)
💡 Coming from Database Migration? If you added custom configuration fields in your migration file, you must create a PresetConfig class with matching fields. See the Database Migration Guide for the migration side of custom fields.
Creating Preset Config
Create a preset_config.py file in your pipeline directory:
Important Notes:
✅ Field names must match exactly with migration field names (if adding via migration)
✅ Field types must match migration
typefield (int,float,bool,str)✅ Default values should match migration
default_value✅ Use Pydantic
Field()for validation constraints (ge, le, min_length, etc.)✅ Add comprehensive docstrings for each field
Registering Preset Config
Register it in pipeline.py:
Accessing Preset Config
PresetConfig values are accessible during pipeline build time via the pipeline_config dictionary:
Accessible in:
build()functionbuild_initial_state()functionbuild_tools()function (if defined)
Example:
Note: PresetConfig values are NOT accessible during pipeline execution (in steps). See RuntimeConfig section below for how to make them accessible during execution.
Preset Config vs Migration Fields
When adding custom configuration fields, you need to define them in two places:
Purpose
Stores field definitions in database
Provides type safety and validation
When to create
When adding custom fields
Always when adding custom fields
Field definition
Dict with type, default_value, ui_type
Pydantic Field with type hints
Validation
Basic (via UI type)
Comprehensive (Pydantic validators)
Location
migration/versions/*.py
glchat_be/config/pipeline/<name>/preset_config.py
Example Mapping:
Key Rule: Field names, types, and defaults must match between migration and PresetConfig.
Runtime Configuration
Runtime configuration serves two critical purposes:
Makes PresetConfig values accessible during pipeline execution (in steps via
input_map)Allows per-request overrides of preset values
Key Concept: PresetConfig values are ONLY accessible in build(), build_initial_state(), and build_tools(). To access them during pipeline execution (in steps), you MUST also define them in RuntimeConfig.
When values are set:
Defaults from PresetConfig (configuration time in Admin Dashboard)
Can be overridden per API request (execution time)
When values are used: Pipeline execution (per request)
Critical Rule:
PresetConfig only: Values accessible during build time, NOT during execution
PresetConfig + RuntimeConfig: Values accessible during build time AND execution time
RuntimeConfig acts as a gate: Only fields defined in RuntimeConfig are available in execution steps
Both should contain the same fields when you want preset values accessible during execution. RuntimeConfig values take precedence over PresetConfig values when provided in API requests.
Creating Runtime Config
To make PresetConfig values accessible during execution, create a RuntimeConfig with the same fields as your PresetConfig.
Create a runtime_config.py file in your pipeline directory:
Key Points:
RuntimeConfig makes PresetConfig values accessible during pipeline execution
Fields should match your PresetConfig fields (same names, types, defaults)
Values come from PresetConfig by default, can be overridden per request
Often includes a companion
StrEnumfor key names to avoid magic strings
Relationship with PresetConfig:
What happens:
User sets
retrieval_top_k = 30in Admin Dashboard (stored in PresetConfig)During execution, RuntimeConfig gets value
30from PresetConfigComponent accesses
retrieval_top_k = 30viainput_mapUser can override to
retrieval_top_k = 10in API requestFor that request, component gets
10instead of30
Registering Runtime Config
Register it in pipeline.py:
Accessing Runtime Config (and PresetConfig values during execution)
Runtime config values, along with state values and general config values, are merged into a single state dictionary during pipeline execution. You access these values through the input_map in your pipeline steps.
This is how you access PresetConfig values during pipeline execution!
How the merged state works:
The pipeline execution merges three sources into one state dictionary:
State (from
build_initial_state()return)Runtime config (ONLY fields defined in RuntimeConfig class)
Values come from PresetConfig defaults
Can be overridden per API request
General config (from
GeneralPipelineConfig)
Critical Points:
✅ Only fields defined in RuntimeConfig are merged into execution state
❌ PresetConfig fields NOT in RuntimeConfig are NOT accessible in steps
✅ RuntimeConfig fields get their default values from PresetConfig
⚠️ Avoid duplicate keys across these three sources!
Accessing values in pipeline steps:
Key Points:
input_mapformat:"component_parameter": "state_key"State keys can come from: state dict, runtime config, or general config
All merged automatically: Runtime config and general config are automatically merged with state
You don't need to manually add runtime config values to state - they're already accessible
Avoid duplicates: Ensure keys don't conflict across state, runtime config, and general config
Example mapping sources:
Note: Runtime config values (from PresetConfig/RuntimeConfig) and general config values are automatically merged into the state. You can access them directly via input_map without adding them to build_initial_state() return.
General Pipeline Config (Available to All Pipelines)
All pipelines have access to GeneralPipelineConfig, which contains common runtime information like request_id, user_id and conversation_id.
Accessing General Pipeline Config:
Important: Values returned from build_initial_state() are merged with runtime config and general config. During pipeline execution, all these sources are combined into one state dictionary accessible via input_map.
Merged State = State + RuntimeConfig + GeneralConfig
Common GeneralPipelineConfig Fields:
User Context:
user_id,conversation_id,sourceKnowledge Base:
knowledge_base_id,connectors,attachmentsSearch Settings:
search_type,normal_search_top_k,smart_search_top_k,web_search_top_kSystem Settings:
enable_guardrails,use_memory,augment_contextModel Config:
model_name,model_kwargs,hyperparameters
See glchat_be/config/pipeline/general_pipeline_config.py for the complete list of available fields.
Pipeline Execution Flow
When a user sends a message, your custom pipeline is not the only thing that executes. GLChat runs a series of automated stages before and after your pipeline to handle common tasks like guardrails, preprocessing, and postprocessing.
Understanding this flow is crucial for building effective pipelines.
Complete Pipeline Architecture

Key Point: You only need to implement Your Pipeline. Everything else is handled automatically by the system.
Stages Before Your Pipeline
1. Guardrails (Optional)
Checks user input for harmful or disallowed content
If detected, stops the entire pipeline before execution
Can be disabled from Admin Dashboard
2. DPO (Document Processing Orchestrator) (Optional)
Processes uploaded files (PDFs, images, etc.) from chat UI
Preprocessing later decides whether to use DPO output or native model handling
Can be disabled from Admin Dashboard
3. Preprocessing (Automatic)
This is where the "auto-magic" happens. Preprocessing handles:
Retrieve Chat History: Pulls all previous messages, metadata, and attachments
Process Attachments:
If model can read file directly (e.g., GPT-4 with vision) → use it
If not → use DPO's processed version
If neither can handle it → skip gracefully
Anonymize User Query (if enabled): Masks PII and stores as
masked_user_queryGenerate Standalone Query: Creates a condensed query based on user input + recent messages
Check Cache (if enabled): Returns cached response instantly if found (sets
cache_hit = True)Retrieve Memory (if enabled): Fetches user memory from past conversations
All preprocessing output is passed to your pipeline via previous_state.
4. Router (Automatic)
Forwards preprocessed data to your pipeline
No configuration needed
Your Pipeline
This is where you focus! Your custom pipeline receives previous_state containing all the preprocessed data.
Important: Your build_initial_state() must accept previous_state:
Available in previous_state:
user_query: Original user messagestandalone_query: Condensed query for retrievalmasked_user_query: Anonymized query (if anonymization enabled)chat_history: List of previous messagesattachments: Processed attachmentscache_hit: Whether cache was foundmemory: Retrieved user memory (if enabled)event_emitter: Event emitter for streamingAnd more... (see PreprocessingState)
Stages After Your Pipeline
5. Postprocessing (Automatic)
After your pipeline completes, postprocessing handles:
Save Cache (if enabled): Stores response for future cache hits
Save Chat History: Saves the Q&A pair including metadata, attachments, PII mappings
Save Memory (if enabled): Stores the conversation if considered meaningful
Required State Fields for Postprocessing:
Your pipeline state MUST include these fields for postprocessing to work:
events
list[dict[str, Any]]
Events for thinking & activity tracking. Empty list if none.
related
list[str]
Related topics or concepts. Empty list if none.
response
str
The response to the user.
Most other required fields are already provided by preprocessing - just pass them through!
Example: Complete Pipeline with Preprocessing
Benefits of This Architecture
✅ Less boilerplate: No need to implement chat history retrieval, attachment processing, etc.
✅ Automatic caching: Cache checking and saving handled for you
✅ Built-in PII protection: Anonymization handled if enabled
✅ Memory support: User memory retrieval and saving automatic
✅ Focus on logic: Spend time on your custom pipeline behavior, not infrastructure
Pipeline Config Resolver Utility
GLChat provides a PipelineConfigResolver utility class that simplifies access to common pipeline configurations like LLM models, embeddings, and other frequently used settings.
What is PipelineConfigResolver?
PipelineConfigResolver is a helper class that:
Provides easy access to common pipeline configurations
Handles default values automatically
Lazily initializes expensive resources (LM invoker, EM invoker, etc.)
Validates configuration values
Using PipelineConfigResolver
Import and initialize:
Available Properties
Model Configuration:
model_name: LLM model name (e.g., "openai/gpt-4")model_kwargs: Model-specific kwargsmodel_env_kwargs: Environment-specific model kwargsmodel_config: ModelConfig tuple (name, kwargs, env_kwargs)
Vectorizer/Embeddings:
vectorizer_config: VectorizerConfig (name, model, kwargs)em_invoker: Embedding model invoker (BaseEMInvoker)langchain_embeddings: Langchain-compatible embeddings (Embeddings)
Invokers:
lm_invoker: Language model invoker (BaseLMInvoker)em_invoker: Embedding model invoker (BaseEMInvoker)
Pipeline Settings:
prompt_context_char_threshold: Character limit for prompt contextchat_history_limit: Maximum chat history messagesreference_formatter_batch_size: Batch size for reference formattingreference_formatter_threshold: Threshold for reference formatterstrategy_batch_size: Batch size for generation strategygeneration_strategy: Generation strategy ("stuff" or "refine")enable_guardrails: Whether guardrails are enabledsupport_multimodal: Whether multimodal is supported
Retrievers and Rerankers:
kb_retriever: Knowledge base retriever (BasicVectorRetriever)reranker: Reranker instance (BaseReranker | None)
Other:
modality_converter: Modality converter for multimodal (BaseModalityConverter | None)rago_pipeline: RAGO pipeline namepipeline_preset_id: Pipeline preset ID
Example: Using PipelineConfigResolver
Benefits of Using PipelineConfigResolver
✅ Cleaner code: No need to manually extract and validate config values
✅ Default values: Automatically handles missing configurations with sensible defaults
✅ Type safety: Returns properly typed objects (invokers, embeddings, etc.)
✅ Lazy loading: Expensive resources only initialized when accessed
✅ Consistency: Use the same configuration patterns across all pipelines
Direct Access vs PipelineConfigResolver
Without PipelineConfigResolver:
With PipelineConfigResolver:
Building Tools
Tools allow you to expose your entire pipeline as a callable component via API. This enables your pipeline to be used as a tool within agent workflows or called directly through the /components/{component_id}/run endpoint.
How Tools Work
When you implement build_tools(), your pipeline becomes accessible as a tool:
Your pipeline is converted to a tool using
.as_tool()Wrapped in a
ToolProcessorfor input/output processingExposed via API endpoint:
POST /components/{chatbot_id}:{tool_name}/run
Creating Tools
Implement the build_tools() function in your pipeline.py:
Tool API Endpoint
Once you implement build_tools(), your pipeline is automatically exposed via API:
Endpoint:
Example Request:
Response:
Tool Components
1. Input Schema (MyPipelineToolInput)
Defines expected input fields
Uses Pydantic for validation
Documents each field with description
2. Tool Processor (MyPipelineToolProcessor)
preprocess(): Validates inputs, adds required fields, merges configpostprocess(): Formats output, extracts relevant fields
3. Build Tools Function
Creates pipeline using
build()Sets
_input_typeand_context_schemaConverts pipeline to tool with
.as_tool()Wraps in ToolProcessor
Returns list of tool processors
Key Requirements
Input Type:
Must be a Pydantic BaseModel
Set via
pipeline._input_typeValidates API request inputs
Context Schema:
Typically
GeneralPipelineConfigSet via
pipeline._context_schemaProvides runtime context (user_id, conversation_id, etc.)
ToolProcessor:
Must extend
ToolProcessorbase classImplement
preprocess()for input transformationImplement
postprocess()for output transformationWraps the tool for API exposure
Configuration Best Practices
When to Use Preset Config
Use preset configuration to define custom configuration fields via database migration:
Add new custom fields to the system
Set default values for your custom fields
Make fields configurable in Admin Dashboard UI
Store configuration at chatbot instance level
Example fields: custom_retrieval_threshold, enable_custom_reranking, custom_batch_size
Where accessible: build(), build_initial_state(), build_tools() only
When values are set: Configuration time (Admin Dashboard)
When to Use Runtime Config
Use runtime configuration to make PresetConfig fields accessible during pipeline execution:
Make custom fields accessible in pipeline steps (via
input_map)Allow per-request overrides of preset values
Enable dynamic parameter changes per API request
Control which fields are exposed at execution time
Example fields: Same as PresetConfig fields you want accessible during execution
Where accessible: Everywhere - build(), build_initial_state(), build_tools(), AND pipeline execution steps
When values are set:
Defaults from PresetConfig (configuration time)
Can be overridden per request (execution time)
Important Rule:
PresetConfig only: Field accessible only during build time
PresetConfig + RuntimeConfig: Field accessible during build time AND execution time
RuntimeConfig acts as a "gate" - only fields defined in RuntimeConfig are available in execution steps
Comparison Table
When values are set
Configuration time (building chatbot)
Execution time (per request)
Execution time (per request)
Accessible in build()
✅ Yes (via pipeline_config)
✅ Yes (via pipeline_config)
❌ No (use in build_initial_state)
Accessible in execution (steps)
❌ No (unless also in RuntimeConfig)
✅ Yes (via input_map)
✅ Yes (via input_map)
Purpose
Define custom config fields with defaults
Make fields accessible during execution + allow overrides
System-wide common config
Source
Database migration + class definition
Extracts from pipeline_config
Provided by system
When to create
When adding custom fields
When you need fields accessible in execution
Always available (no need to create)
Typical fields
Custom thresholds, custom features
Same fields as PresetConfig (for execution access)
User context, chat history, system settings
Example
custom_threshold: float = 0.7
custom_threshold: float = 0.7 (makes it accessible in steps)
user_id: str, conversation_id: str
Key Differences:
PresetConfig: Values accessible only in
build(),build_initial_state(),build_tools()RuntimeConfig: Makes PresetConfig values accessible during execution + allows per-request overrides
GeneralPipelineConfig: System-wide fields always accessible during execution
To use a PresetConfig field in steps: You MUST also define it in RuntimeConfig
Examples
Example: Custom RAG Pipeline with PresetConfig + RuntimeConfig
Scenario: You're adding custom fields for retrieval and reranking configuration, and you want them accessible during pipeline execution.
Step 1: Add to migration
Step 2: Create PresetConfig
Step 3: Create RuntimeConfig (REQUIRED for execution access)
Step 4: Use in pipeline
Why both are needed:
PresetConfig: Defines fields in DB + Admin Dashboard UI + provides defaults
RuntimeConfig: Makes fields accessible in execution steps + allows overrides
Example: Pipeline with RuntimeConfig
Scenario: You want to allow runtime overrides for your custom preset values.
How it works:
User sets
custom_retrieval_top_k = 10in Admin Dashboard (PresetConfig)By default, pipeline uses
10when executedUser can override to
custom_retrieval_top_k = 5in API request (RuntimeConfig)For that specific request, the component receives
5viainput_mapNext request without override uses
10again (back to preset)
Example: Using GeneralPipelineConfig
Scenario: You need access to user context and chat history.
Example: PresetConfig vs RuntimeConfig Flow
Complete workflow showing how preset and runtime values work together:
Step 1: Define PresetConfig (values set at configuration time)
Step 2: Define RuntimeConfig (same fields, can be overridden at runtime)
Step 3: Use in pipeline
Execution Flow:
Admin Dashboard: User sets
retrieval_top_k = 20(saved as preset)API Request 1 (no override): Component receives
retrieval_top_k = 20viainput_mapAPI Request 2 (with override
retrieval_top_k = 5): Component receives5viainput_mapAPI Request 3 (no override): Component receives
20again viainput_map
Understanding the Merged State
The merged state is the core mechanism for passing data through your pipeline. Here's how it works:
Stage 1: Build Time (build() function)
Access
pipeline_configfor conditional logic and component creationUse
pipeline_config.get("key")to read preset/runtime valuesCreate pipeline steps with
input_map
Stage 2: Execution Time (pipeline running)
System automatically merges three sources into one state dict:
Values from
build_initial_state()returnRuntime config values (PresetConfig + RuntimeConfig overrides)
General config values (GeneralPipelineConfig)
Components receive values via
input_mapkeysAll sources are accessible through the same key namespace
Visual Example:
Key Takeaway: Use input_map to access any value from the merged state - you don't need to know which source it came from!
Critical Rule: Only fields defined in RuntimeConfig (or returned from build_initial_state(), or from GeneralPipelineConfig) are available in the merged state. PresetConfig fields are NOT automatically merged - you must also define them in RuntimeConfig to make them accessible during execution.
Last updated