Advanced Configuration

This guide covers advanced configuration options for custom pipelines, including preset configurations, runtime configurations, and building tools.

Optional Module Attributes

Add these optional attributes to your pipeline.py:

from glchat_be.config.pipeline.my_custom_pipeline.preset_config import MyCustomPresetConfig
from glchat_be.config.pipeline.my_custom_pipeline.runtime_config import MyCustomRuntimeConfig

# Optional: Override pipeline name
name = "my-custom-pipeline-v2"

# Optional: Preset configuration class
preset_config_class = MyCustomPresetConfig

# Optional: Runtime configuration class
additional_config_class = MyCustomRuntimeConfig

# Required functions
async def build(...): pass
def build_initial_state(...): pass

# Optional: Tools function
async def build_tools(...): pass

Preset Configuration

Preset configuration defines pipeline-specific settings that are configured in the Admin Dashboard when building/creating the chatbot. These "preset" values serve as defaults for the pipeline and can optionally be overridden at runtime via RuntimeConfig.

When values are set: Configuration time (Admin Dashboard) When values are used: Pipeline build time (when constructing the pipeline)

💡 Coming from Database Migration? If you added custom configuration fields in your migration file, you must create a PresetConfig class with matching fields. See the Database Migration Guide for the migration side of custom fields.

Creating Preset Config

Create a preset_config.py file in your pipeline directory:

Important Notes:

  • ✅ Field names must match exactly with migration field names (if adding via migration)

  • ✅ Field types must match migration type field (int, float, bool, str)

  • ✅ Default values should match migration default_value

  • ✅ Use Pydantic Field() for validation constraints (ge, le, min_length, etc.)

  • ✅ Add comprehensive docstrings for each field

Registering Preset Config

Register it in pipeline.py:

Accessing Preset Config

PresetConfig values are accessible during pipeline build time via the pipeline_config dictionary:

Accessible in:

  • build() function

  • build_initial_state() function

  • build_tools() function (if defined)

Example:

Note: PresetConfig values are NOT accessible during pipeline execution (in steps). See RuntimeConfig section below for how to make them accessible during execution.

Preset Config vs Migration Fields

When adding custom configuration fields, you need to define them in two places:

Aspect
Migration File
PresetConfig Class

Purpose

Stores field definitions in database

Provides type safety and validation

When to create

When adding custom fields

Always when adding custom fields

Field definition

Dict with type, default_value, ui_type

Pydantic Field with type hints

Validation

Basic (via UI type)

Comprehensive (Pydantic validators)

Location

migration/versions/*.py

glchat_be/config/pipeline/<name>/preset_config.py

Example Mapping:

Key Rule: Field names, types, and defaults must match between migration and PresetConfig.

Runtime Configuration

Runtime configuration serves two critical purposes:

  1. Makes PresetConfig values accessible during pipeline execution (in steps via input_map)

  2. Allows per-request overrides of preset values

Key Concept: PresetConfig values are ONLY accessible in build(), build_initial_state(), and build_tools(). To access them during pipeline execution (in steps), you MUST also define them in RuntimeConfig.

When values are set:

  • Defaults from PresetConfig (configuration time in Admin Dashboard)

  • Can be overridden per API request (execution time)

When values are used: Pipeline execution (per request)

Critical Rule:

  • PresetConfig only: Values accessible during build time, NOT during execution

  • PresetConfig + RuntimeConfig: Values accessible during build time AND execution time

  • RuntimeConfig acts as a gate: Only fields defined in RuntimeConfig are available in execution steps

Both should contain the same fields when you want preset values accessible during execution. RuntimeConfig values take precedence over PresetConfig values when provided in API requests.

Creating Runtime Config

To make PresetConfig values accessible during execution, create a RuntimeConfig with the same fields as your PresetConfig.

Create a runtime_config.py file in your pipeline directory:

Key Points:

  • RuntimeConfig makes PresetConfig values accessible during pipeline execution

  • Fields should match your PresetConfig fields (same names, types, defaults)

  • Values come from PresetConfig by default, can be overridden per request

  • Often includes a companion StrEnum for key names to avoid magic strings

Relationship with PresetConfig:

What happens:

  1. User sets retrieval_top_k = 30 in Admin Dashboard (stored in PresetConfig)

  2. During execution, RuntimeConfig gets value 30 from PresetConfig

  3. Component accesses retrieval_top_k = 30 via input_map

  4. User can override to retrieval_top_k = 10 in API request

  5. For that request, component gets 10 instead of 30

Registering Runtime Config

Register it in pipeline.py:

Accessing Runtime Config (and PresetConfig values during execution)

Runtime config values, along with state values and general config values, are merged into a single state dictionary during pipeline execution. You access these values through the input_map in your pipeline steps.

This is how you access PresetConfig values during pipeline execution!

How the merged state works:

The pipeline execution merges three sources into one state dictionary:

  1. State (from build_initial_state() return)

  2. Runtime config (ONLY fields defined in RuntimeConfig class)

    • Values come from PresetConfig defaults

    • Can be overridden per API request

  3. General config (from GeneralPipelineConfig)

Critical Points:

  • ✅ Only fields defined in RuntimeConfig are merged into execution state

  • ❌ PresetConfig fields NOT in RuntimeConfig are NOT accessible in steps

  • ✅ RuntimeConfig fields get their default values from PresetConfig

  • ⚠️ Avoid duplicate keys across these three sources!

Accessing values in pipeline steps:

Key Points:

  • input_map format: "component_parameter": "state_key"

  • State keys can come from: state dict, runtime config, or general config

  • All merged automatically: Runtime config and general config are automatically merged with state

  • You don't need to manually add runtime config values to state - they're already accessible

  • Avoid duplicates: Ensure keys don't conflict across state, runtime config, and general config

Example mapping sources:

Note: Runtime config values (from PresetConfig/RuntimeConfig) and general config values are automatically merged into the state. You can access them directly via input_map without adding them to build_initial_state() return.

General Pipeline Config (Available to All Pipelines)

All pipelines have access to GeneralPipelineConfig, which contains common runtime information like request_id, user_id and conversation_id.

Accessing General Pipeline Config:

Important: Values returned from build_initial_state() are merged with runtime config and general config. During pipeline execution, all these sources are combined into one state dictionary accessible via input_map.

Merged State = State + RuntimeConfig + GeneralConfig

Common GeneralPipelineConfig Fields:

  • User Context: user_id, conversation_id, source

  • Knowledge Base: knowledge_base_id, connectors, attachments

  • Search Settings: search_type, normal_search_top_k, smart_search_top_k, web_search_top_k

  • System Settings: enable_guardrails, use_memory, augment_context

  • Model Config: model_name, model_kwargs, hyperparameters

See glchat_be/config/pipeline/general_pipeline_config.py for the complete list of available fields.

Pipeline Execution Flow

When a user sends a message, your custom pipeline is not the only thing that executes. GLChat runs a series of automated stages before and after your pipeline to handle common tasks like guardrails, preprocessing, and postprocessing.

Understanding this flow is crucial for building effective pipelines.

Complete Pipeline Architecture

Key Point: You only need to implement Your Pipeline. Everything else is handled automatically by the system.

Stages Before Your Pipeline

1. Guardrails (Optional)

  • Checks user input for harmful or disallowed content

  • If detected, stops the entire pipeline before execution

  • Can be disabled from Admin Dashboard

2. DPO (Document Processing Orchestrator) (Optional)

  • Processes uploaded files (PDFs, images, etc.) from chat UI

  • Preprocessing later decides whether to use DPO output or native model handling

  • Can be disabled from Admin Dashboard

3. Preprocessing (Automatic)

This is where the "auto-magic" happens. Preprocessing handles:

  1. Retrieve Chat History: Pulls all previous messages, metadata, and attachments

  2. Process Attachments:

    • If model can read file directly (e.g., GPT-4 with vision) → use it

    • If not → use DPO's processed version

    • If neither can handle it → skip gracefully

  3. Anonymize User Query (if enabled): Masks PII and stores as masked_user_query

  4. Generate Standalone Query: Creates a condensed query based on user input + recent messages

  5. Check Cache (if enabled): Returns cached response instantly if found (sets cache_hit = True)

  6. Retrieve Memory (if enabled): Fetches user memory from past conversations

All preprocessing output is passed to your pipeline via previous_state.

4. Router (Automatic)

  • Forwards preprocessed data to your pipeline

  • No configuration needed

Your Pipeline

This is where you focus! Your custom pipeline receives previous_state containing all the preprocessed data.

Important: Your build_initial_state() must accept previous_state:

Available in previous_state:

  • user_query: Original user message

  • standalone_query: Condensed query for retrieval

  • masked_user_query: Anonymized query (if anonymization enabled)

  • chat_history: List of previous messages

  • attachments: Processed attachments

  • cache_hit: Whether cache was found

  • memory: Retrieved user memory (if enabled)

  • event_emitter: Event emitter for streaming

  • And more... (see PreprocessingState)

Stages After Your Pipeline

5. Postprocessing (Automatic)

After your pipeline completes, postprocessing handles:

  1. Save Cache (if enabled): Stores response for future cache hits

  2. Save Chat History: Saves the Q&A pair including metadata, attachments, PII mappings

  3. Save Memory (if enabled): Stores the conversation if considered meaningful

Required State Fields for Postprocessing:

Your pipeline state MUST include these fields for postprocessing to work:

Field
Type
Description

events

list[dict[str, Any]]

Events for thinking & activity tracking. Empty list if none.

related

list[str]

Related topics or concepts. Empty list if none.

response

str

The response to the user.

Most other required fields are already provided by preprocessing - just pass them through!

Example: Complete Pipeline with Preprocessing

Benefits of This Architecture

Less boilerplate: No need to implement chat history retrieval, attachment processing, etc.

Automatic caching: Cache checking and saving handled for you

Built-in PII protection: Anonymization handled if enabled

Memory support: User memory retrieval and saving automatic

Focus on logic: Spend time on your custom pipeline behavior, not infrastructure

Pipeline Config Resolver Utility

GLChat provides a PipelineConfigResolver utility class that simplifies access to common pipeline configurations like LLM models, embeddings, and other frequently used settings.

What is PipelineConfigResolver?

PipelineConfigResolver is a helper class that:

  • Provides easy access to common pipeline configurations

  • Handles default values automatically

  • Lazily initializes expensive resources (LM invoker, EM invoker, etc.)

  • Validates configuration values

Using PipelineConfigResolver

Import and initialize:

Available Properties

Model Configuration:

  • model_name: LLM model name (e.g., "openai/gpt-4")

  • model_kwargs: Model-specific kwargs

  • model_env_kwargs: Environment-specific model kwargs

  • model_config: ModelConfig tuple (name, kwargs, env_kwargs)

Vectorizer/Embeddings:

  • vectorizer_config: VectorizerConfig (name, model, kwargs)

  • em_invoker: Embedding model invoker (BaseEMInvoker)

  • langchain_embeddings: Langchain-compatible embeddings (Embeddings)

Invokers:

  • lm_invoker: Language model invoker (BaseLMInvoker)

  • em_invoker: Embedding model invoker (BaseEMInvoker)

Pipeline Settings:

  • prompt_context_char_threshold: Character limit for prompt context

  • chat_history_limit: Maximum chat history messages

  • reference_formatter_batch_size: Batch size for reference formatting

  • reference_formatter_threshold: Threshold for reference formatter

  • strategy_batch_size: Batch size for generation strategy

  • generation_strategy: Generation strategy ("stuff" or "refine")

  • enable_guardrails: Whether guardrails are enabled

  • support_multimodal: Whether multimodal is supported

Retrievers and Rerankers:

  • kb_retriever: Knowledge base retriever (BasicVectorRetriever)

  • reranker: Reranker instance (BaseReranker | None)

Other:

  • modality_converter: Modality converter for multimodal (BaseModalityConverter | None)

  • rago_pipeline: RAGO pipeline name

  • pipeline_preset_id: Pipeline preset ID

Example: Using PipelineConfigResolver

Benefits of Using PipelineConfigResolver

Cleaner code: No need to manually extract and validate config values

Default values: Automatically handles missing configurations with sensible defaults

Type safety: Returns properly typed objects (invokers, embeddings, etc.)

Lazy loading: Expensive resources only initialized when accessed

Consistency: Use the same configuration patterns across all pipelines

Direct Access vs PipelineConfigResolver

Without PipelineConfigResolver:

With PipelineConfigResolver:

Building Tools

Tools allow you to expose your entire pipeline as a callable component via API. This enables your pipeline to be used as a tool within agent workflows or called directly through the /components/{component_id}/run endpoint.

How Tools Work

When you implement build_tools(), your pipeline becomes accessible as a tool:

  1. Your pipeline is converted to a tool using .as_tool()

  2. Wrapped in a ToolProcessor for input/output processing

  3. Exposed via API endpoint: POST /components/{chatbot_id}:{tool_name}/run

Creating Tools

Implement the build_tools() function in your pipeline.py:

Tool API Endpoint

Once you implement build_tools(), your pipeline is automatically exposed via API:

Endpoint:

Example Request:

Response:

Tool Components

1. Input Schema (MyPipelineToolInput)

  • Defines expected input fields

  • Uses Pydantic for validation

  • Documents each field with description

2. Tool Processor (MyPipelineToolProcessor)

  • preprocess(): Validates inputs, adds required fields, merges config

  • postprocess(): Formats output, extracts relevant fields

3. Build Tools Function

  • Creates pipeline using build()

  • Sets _input_type and _context_schema

  • Converts pipeline to tool with .as_tool()

  • Wraps in ToolProcessor

  • Returns list of tool processors

Key Requirements

Input Type:

  • Must be a Pydantic BaseModel

  • Set via pipeline._input_type

  • Validates API request inputs

Context Schema:

  • Typically GeneralPipelineConfig

  • Set via pipeline._context_schema

  • Provides runtime context (user_id, conversation_id, etc.)

ToolProcessor:

  • Must extend ToolProcessor base class

  • Implement preprocess() for input transformation

  • Implement postprocess() for output transformation

  • Wraps the tool for API exposure

Configuration Best Practices

When to Use Preset Config

Use preset configuration to define custom configuration fields via database migration:

  • Add new custom fields to the system

  • Set default values for your custom fields

  • Make fields configurable in Admin Dashboard UI

  • Store configuration at chatbot instance level

Example fields: custom_retrieval_threshold, enable_custom_reranking, custom_batch_size

Where accessible: build(), build_initial_state(), build_tools() only

When values are set: Configuration time (Admin Dashboard)

When to Use Runtime Config

Use runtime configuration to make PresetConfig fields accessible during pipeline execution:

  • Make custom fields accessible in pipeline steps (via input_map)

  • Allow per-request overrides of preset values

  • Enable dynamic parameter changes per API request

  • Control which fields are exposed at execution time

Example fields: Same as PresetConfig fields you want accessible during execution

Where accessible: Everywhere - build(), build_initial_state(), build_tools(), AND pipeline execution steps

When values are set:

  • Defaults from PresetConfig (configuration time)

  • Can be overridden per request (execution time)

Important Rule:

  • PresetConfig only: Field accessible only during build time

  • PresetConfig + RuntimeConfig: Field accessible during build time AND execution time

  • RuntimeConfig acts as a "gate" - only fields defined in RuntimeConfig are available in execution steps

Comparison Table

Aspect
PresetConfig
RuntimeConfig
GeneralPipelineConfig

When values are set

Configuration time (building chatbot)

Execution time (per request)

Execution time (per request)

Accessible in build()

✅ Yes (via pipeline_config)

✅ Yes (via pipeline_config)

❌ No (use in build_initial_state)

Accessible in execution (steps)

❌ No (unless also in RuntimeConfig)

✅ Yes (via input_map)

✅ Yes (via input_map)

Purpose

Define custom config fields with defaults

Make fields accessible during execution + allow overrides

System-wide common config

Source

Database migration + class definition

Extracts from pipeline_config

Provided by system

When to create

When adding custom fields

When you need fields accessible in execution

Always available (no need to create)

Typical fields

Custom thresholds, custom features

Same fields as PresetConfig (for execution access)

User context, chat history, system settings

Example

custom_threshold: float = 0.7

custom_threshold: float = 0.7 (makes it accessible in steps)

user_id: str, conversation_id: str

Key Differences:

  • PresetConfig: Values accessible only in build(), build_initial_state(), build_tools()

  • RuntimeConfig: Makes PresetConfig values accessible during execution + allows per-request overrides

  • GeneralPipelineConfig: System-wide fields always accessible during execution

  • To use a PresetConfig field in steps: You MUST also define it in RuntimeConfig

Examples

Example: Custom RAG Pipeline with PresetConfig + RuntimeConfig

Scenario: You're adding custom fields for retrieval and reranking configuration, and you want them accessible during pipeline execution.

Step 1: Add to migration

Step 2: Create PresetConfig

Step 3: Create RuntimeConfig (REQUIRED for execution access)

Step 4: Use in pipeline

Why both are needed:

  • PresetConfig: Defines fields in DB + Admin Dashboard UI + provides defaults

  • RuntimeConfig: Makes fields accessible in execution steps + allows overrides

Example: Pipeline with RuntimeConfig

Scenario: You want to allow runtime overrides for your custom preset values.

How it works:

  1. User sets custom_retrieval_top_k = 10 in Admin Dashboard (PresetConfig)

  2. By default, pipeline uses 10 when executed

  3. User can override to custom_retrieval_top_k = 5 in API request (RuntimeConfig)

  4. For that specific request, the component receives 5 via input_map

  5. Next request without override uses 10 again (back to preset)

Example: Using GeneralPipelineConfig

Scenario: You need access to user context and chat history.

Example: PresetConfig vs RuntimeConfig Flow

Complete workflow showing how preset and runtime values work together:

Step 1: Define PresetConfig (values set at configuration time)

Step 2: Define RuntimeConfig (same fields, can be overridden at runtime)

Step 3: Use in pipeline

Execution Flow:

  1. Admin Dashboard: User sets retrieval_top_k = 20 (saved as preset)

  2. API Request 1 (no override): Component receives retrieval_top_k = 20 via input_map

  3. API Request 2 (with override retrieval_top_k = 5): Component receives 5 via input_map

  4. API Request 3 (no override): Component receives 20 again via input_map

Understanding the Merged State

The merged state is the core mechanism for passing data through your pipeline. Here's how it works:

Stage 1: Build Time (build() function)

  • Access pipeline_config for conditional logic and component creation

  • Use pipeline_config.get("key") to read preset/runtime values

  • Create pipeline steps with input_map

Stage 2: Execution Time (pipeline running)

  • System automatically merges three sources into one state dict:

    1. Values from build_initial_state() return

    2. Runtime config values (PresetConfig + RuntimeConfig overrides)

    3. General config values (GeneralPipelineConfig)

  • Components receive values via input_map keys

  • All sources are accessible through the same key namespace

Visual Example:

Key Takeaway: Use input_map to access any value from the merged state - you don't need to know which source it came from!

Critical Rule: Only fields defined in RuntimeConfig (or returned from build_initial_state(), or from GeneralPipelineConfig) are available in the merged state. PresetConfig fields are NOT automatically merged - you must also define them in RuntimeConfig to make them accessible during execution.

Last updated