Advanced Configuration

This guide covers advanced configuration options for custom pipelines, including preset configurations, runtime configurations, and building tools.

Optional Module Attributes

Add these optional attributes to your pipeline.py:

from glchat_be.config.pipeline.my_custom_pipeline.preset_config import MyCustomPresetConfig
from glchat_be.config.pipeline.my_custom_pipeline.runtime_config import MyCustomRuntimeConfig

# Optional: Override pipeline name
name = "my-custom-pipeline-v2"

# Optional: Preset configuration class
preset_config_class = MyCustomPresetConfig

# Optional: Runtime configuration class
additional_config_class = MyCustomRuntimeConfig

# Required functions
async def build(...): pass
def build_initial_state(...): pass

# Optional: Tools function
async def build_tools(...): pass

Preset Configuration

Preset configuration defines pipeline-specific settings that are configured in the Admin Dashboard when building/creating the chatbot. These "preset" values serve as defaults for the pipeline and can optionally be overridden at runtime via RuntimeConfig.

When values are set: Configuration time (Admin Dashboard) When values are used: Pipeline build time (when constructing the pipeline)

💡 Coming from Database Migration? If you added custom configuration fields in your migration file, you must create a PresetConfig class with matching fields. See the Database Migration Guide for the migration side of custom fields.

Creating Preset Config

Create a preset_config.py file in your pipeline directory:

"""My Custom RAG Preset Config.

Authors:
    John Doe (john.doe@gdplabs.id)

References:
    NONE
"""

from glchat_plugin.pipeline.base_pipeline_preset_config import BasePipelinePresetConfig
from pydantic import Field


class MyCustomPresetConfig(BasePipelinePresetConfig):
    """Preset configuration for My Custom RAG.

    Attributes:
        custom_top_k (int): Number of chunks to retrieve.
            Must be between 1 and 100 (inclusive).
        enable_reranking (bool): Whether to enable reranking.
        custom_threshold (float): Similarity threshold.
            Must be between 0.0 and 1.0 (inclusive).
    """

    custom_top_k: int = Field(default=10, ge=1, le=100)
    enable_reranking: bool = True
    custom_threshold: float = Field(default=0.7, ge=0.0, le=1.0)

Important Notes:

✅ Field names must match exactly with migration field names (if adding via migration)
✅ Field types must match migration type field (int, float, bool, str)
✅ Default values should match migration default_value
✅ Use Pydantic Field() for validation constraints (ge, le, min_length, etc.)
✅ Add comprehensive docstrings for each field

Registering Preset Config

from glchat_be.config.pipeline.my_custom_pipeline.preset_config import MyCustomPresetConfig

preset_config_class = MyCustomPresetConfig

Accessing Preset Config

PresetConfig values are accessible during pipeline build time via the pipeline_config dictionary:

Accessible in:

build() function
build_initial_state() function
build_tools() function (if defined)

Example:

async def build(
    pipeline_config: dict[str, Any],
    ...
) -> Pipeline:
    # Access preset config for conditional logic or component setup
    enable_reranking = pipeline_config.get("enable_reranking", True)
    threshold = pipeline_config.get("custom_threshold", 0.7)
    
    # Use for conditional component creation
    if enable_reranking:
        reranker = create_reranker(threshold=threshold)
    else:
        reranker = None
    
    # ...

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    previous_state: dict[str, Any] | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    # Access preset config here
    custom_value = pipeline_config.get("custom_field", "default")
    
    return {
        "user_query": previous_state.get("user_query"),
        "custom_value": custom_value,  # Add to state for execution
        # ...
    }

Note: PresetConfig values are NOT accessible during pipeline execution (in steps). See RuntimeConfig section below for how to make them accessible during execution.

Preset Config vs Migration Fields

When adding custom configuration fields, you need to define them in two places:

Aspect

Migration File

PresetConfig Class

Purpose

Stores field definitions in database

Provides type safety and validation

When to create

When adding custom fields

Always when adding custom fields

Field definition

Dict with type, default_value, ui_type

Pydantic Field with type hints

Validation

Basic (via UI type)

Comprehensive (Pydantic validators)

Location

migration/versions/*.py

glchat_be/config/pipeline/<name>/preset_config.py

Example Mapping:

# Migration file
custom_config_fields = {
    "custom_threshold": {
        "type": "float",           # → float type in PresetConfig
        "default_value": "0.7",    # → default=0.7 in PresetConfig
        "ui_type": "number",
        "level": "PRESET_CHATBOT",
        "category_id": cat_id,
    }
}

# PresetConfig class
class MyCustomPresetConfig(BasePipelinePresetConfig):
    custom_threshold: float = Field(default=0.7, ge=0.0, le=1.0)
    #                 ↑ type         ↑ default    ↑ validation

Key Rule: Field names, types, and defaults must match between migration and PresetConfig.

Runtime Configuration

Runtime configuration serves two critical purposes:

Makes PresetConfig values accessible during pipeline execution (in steps via input_map)
Allows per-request overrides of preset values

Key Concept: PresetConfig values are ONLY accessible in build(), build_initial_state(), and build_tools(). To access them during pipeline execution (in steps), you MUST also define them in RuntimeConfig.

When values are set:

Defaults from PresetConfig (configuration time in Admin Dashboard)
Can be overridden per API request (execution time)

When values are used: Pipeline execution (per request)

Critical Rule:

PresetConfig only: Values accessible during build time, NOT during execution
PresetConfig + RuntimeConfig: Values accessible during build time AND execution time
RuntimeConfig acts as a gate: Only fields defined in RuntimeConfig are available in execution steps

Both should contain the same fields when you want preset values accessible during execution. RuntimeConfig values take precedence over PresetConfig values when provided in API requests.

Creating Runtime Config

To make PresetConfig values accessible during execution, create a RuntimeConfig with the same fields as your PresetConfig.

Create a runtime_config.py file in your pipeline directory:

"""My Custom RAG Runtime Config.

Authors:
    John Doe (john.doe@gdplabs.id)

References:
    NONE
"""

from enum import StrEnum
from typing import Literal

from pydantic import BaseModel, Field

from glchat_be.api.model.constant import ReferenceFormatterType


class MyCustomRuntimeConfig(BaseModel):
    """Runtime configuration for My Custom RAG.

    Attributes:
        augment_context (bool): Whether context augmentation from the knowledge base is allowed.
        use_cache (bool): Whether to check and use cached response, if available.
        use_model_knowledge (bool): Whether to use model knowledge.
        reference_formatter_type (ReferenceFormatterType): The reference formatter type.
        normal_search_top_k (int): The top k for normal search. Must be greater than or equal to 1.
        enable_reranking (bool): Whether to enable reranking.
    """

    augment_context: bool = True
    use_cache: bool = True
    use_model_knowledge: bool = True
    reference_formatter_type: ReferenceFormatterType = ReferenceFormatterType.LM
    normal_search_top_k: int = Field(default=20, ge=1)
    enable_reranking: bool = False

Key Points:

RuntimeConfig makes PresetConfig values accessible during pipeline execution
Fields should match your PresetConfig fields (same names, types, defaults)
Values come from PresetConfig by default, can be overridden per request
Often includes a companion StrEnum for key names to avoid magic strings

Relationship with PresetConfig:

# preset_config.py - defines fields for Admin Dashboard
class MyPresetConfig(BasePipelinePresetConfig):
    enable_reranking: bool = True
    retrieval_top_k: int = Field(default=20, ge=1)

# runtime_config.py - makes those fields accessible during execution
class MyRuntimeConfig(BaseModel):
    enable_reranking: bool = True          # Same field!
    retrieval_top_k: int = Field(default=20, ge=1)  # Same field!

What happens:

User sets retrieval_top_k = 30 in Admin Dashboard (stored in PresetConfig)
During execution, RuntimeConfig gets value 30 from PresetConfig
Component accesses retrieval_top_k = 30 via input_map
User can override to retrieval_top_k = 10 in API request
For that request, component gets 10 instead of 30

Registering Runtime Config

from glchat_be.config.pipeline.my_custom_pipeline.runtime_config import MyCustomRuntimeConfig

additional_config_class = MyCustomRuntimeConfig

Accessing Runtime Config (and PresetConfig values during execution)

Runtime config values, along with state values and general config values, are merged into a single state dictionary during pipeline execution. You access these values through the input_map in your pipeline steps.

This is how you access PresetConfig values during pipeline execution!

How the merged state works:

The pipeline execution merges three sources into one state dictionary:

State (from build_initial_state() return)
Runtime config (ONLY fields defined in RuntimeConfig class)
- Values come from PresetConfig defaults
- Can be overridden per API request
General config (from GeneralPipelineConfig)

Critical Points:

✅ Only fields defined in RuntimeConfig are merged into execution state
❌ PresetConfig fields NOT in RuntimeConfig are NOT accessible in steps
✅ RuntimeConfig fields get their default values from PresetConfig
⚠️ Avoid duplicate keys across these three sources!

Accessing values in pipeline steps:

async def build(
    pipeline_config: dict[str, Any],
    ...
) -> Pipeline:
    # Create your components
    retriever = create_retriever()
    response_synthesizer = ResponseSynthesizer.stuff_preset(
        model_id="openai/gpt-4-mini",
        credentials=os.getenv("OPENAI_API_KEY")
    )
    
    # Step 1: Retrieval
    retriever_step = step(
        component=retriever,
        input_map={
            # Map component parameter to state key
            # Format: "component_parameter": "state_key"
            "query": "user_query",                    # From state
            "top_k": "normal_search_top_k",           # From runtime config
            "knowledge_base_id": "knowledge_base_id", # From general config
        },
        output_state="retrieved_chunks"
    ),
        
    # Step 2: Response generation
    generation_step = step(
        component=response_synthesizer,
        input_map={
            "query": "user_query",           # From state
            "context": "retrieved_chunks",   # From previous step output
            "history": "chat_history",       # From general config
        },
        output_state="response"
    )
    
    return retriever_step | generation_step

Key Points:

input_map format: "component_parameter": "state_key"
State keys can come from: state dict, runtime config, or general config
All merged automatically: Runtime config and general config are automatically merged with state
You don't need to manually add runtime config values to state - they're already accessible
Avoid duplicates: Ensure keys don't conflict across state, runtime config, and general config

Example mapping sources:

input_map={
    "query": "user_query",              # From build_initial_state return
    "top_k": "normal_search_top_k",     # From runtime config (automatically merged)
    "user_id": "user_id",               # From GeneralPipelineConfig (automatically merged)
    "custom_value": "my_custom_state",  # From build_initial_state return
}

Note: Runtime config values (from PresetConfig/RuntimeConfig) and general config values are automatically merged into the state. You can access them directly via input_map without adding them to build_initial_state() return.

General Pipeline Config (Available to All Pipelines)

All pipelines have access to GeneralPipelineConfig, which contains common runtime information like request_id, user_id and conversation_id.

Accessing General Pipeline Config:

from glchat_be.config.pipeline.general_pipeline_config import GeneralPipelineConfig

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    previous_state: dict[str, Any] | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    # Extract general config
    general_config = GeneralPipelineConfig.from_request_config(request)
    
    # The returned dict becomes part of the merged state
    # These keys will be available in input_map
    return {
        "user_query": previous_state.get("user_query"),
        "event_emitter": previous_state.get("event_emitter"),
        
        # Add general config values to state for access via input_map
        "user_id": general_config.user_id,
        "conversation_id": general_config.conversation_id,
        "request_id": general_config.request_id,
        
        # Add runtime config values to state (optional, for custom processing)
        "normal_search_top_k": pipeline_config.get("normal_search_top_k", 20),
        "use_cache": pipeline_config.get("use_cache", True),
    }

Important: Values returned from build_initial_state() are merged with runtime config and general config. During pipeline execution, all these sources are combined into one state dictionary accessible via input_map.

Merged State = State + RuntimeConfig + GeneralConfig

Common GeneralPipelineConfig Fields:

User Context: user_id, conversation_id, source
Knowledge Base: knowledge_base_id, connectors, attachments
Search Settings: search_type, normal_search_top_k, smart_search_top_k, web_search_top_k
System Settings: enable_guardrails, use_memory, augment_context
Model Config: model_name, model_kwargs, hyperparameters

See glchat_be/config/pipeline/general_pipeline_config.py for the complete list of available fields.

Pipeline Execution Flow

When a user sends a message, your custom pipeline is not the only thing that executes. GLChat runs a series of automated stages before and after your pipeline to handle common tasks like guardrails, preprocessing, and postprocessing.

Understanding this flow is crucial for building effective pipelines.

Complete Pipeline Architecture

Key Point: You only need to implement Your Pipeline. Everything else is handled automatically by the system.

Stages Before Your Pipeline

1. Guardrails (Optional)

Checks user input for harmful or disallowed content
If detected, stops the entire pipeline before execution
Can be disabled from Admin Dashboard

2. DPO (Document Processing Orchestrator) (Optional)

Processes uploaded files (PDFs, images, etc.) from chat UI
Preprocessing later decides whether to use DPO output or native model handling
Can be disabled from Admin Dashboard

3. Preprocessing (Automatic)

This is where the "auto-magic" happens. Preprocessing handles:

Retrieve Chat History: Pulls all previous messages, metadata, and attachments
Process Attachments:
- If model can read file directly (e.g., GPT-4 with vision) → use it
- If not → use DPO's processed version
- If neither can handle it → skip gracefully
Anonymize User Query (if enabled): Masks PII and stores as masked_user_query
Generate Standalone Query: Creates a condensed query based on user input + recent messages
Check Cache (if enabled): Returns cached response instantly if found (sets cache_hit = True)
Retrieve Memory (if enabled): Fetches user memory from past conversations

All preprocessing output is passed to your pipeline via previous_state.

4. Router (Automatic)

Forwards preprocessed data to your pipeline
No configuration needed

Your Pipeline

This is where you focus! Your custom pipeline receives previous_state containing all the preprocessed data.

Important: Your build_initial_state() must accept previous_state:

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    previous_state: dict[str, Any] | None = None,  # ← Required!
    **kwargs: Any,
) -> dict[str, Any]:
    """Build initial state for pipeline execution.
    
    Args:
        request: Raw request data
        pipeline_config: Pipeline configuration
        previous_state: Preprocessed data (chat history, attachments, cache info, etc.)
        **kwargs: Additional context
    
    Returns:
        State dictionary for pipeline execution
    """
    if not previous_state:
        raise ValueError("Previous state is required")
    
    # Access preprocessed data
    user_query = previous_state.get("user_query")
    chat_history = previous_state.get("chat_history", [])
    cache_hit = previous_state.get("cache_hit", False)
    standalone_query = previous_state.get("standalone_query")
    
    # Return state for your pipeline
    return {
        "user_query": user_query,
        "chat_history": chat_history,
        "standalone_query": standalone_query,
        "event_emitter": previous_state.get("event_emitter"),
        # Add your custom fields
        # ...
    }

Available in previous_state:

user_query: Original user message
standalone_query: Condensed query for retrieval
masked_user_query: Anonymized query (if anonymization enabled)
chat_history: List of previous messages
attachments: Processed attachments
cache_hit: Whether cache was found
memory: Retrieved user memory (if enabled)
event_emitter: Event emitter for streaming
And more... (see PreprocessingState)

Stages After Your Pipeline

5. Postprocessing (Automatic)

After your pipeline completes, postprocessing handles:

Save Cache (if enabled): Stores response for future cache hits
Save Chat History: Saves the Q&A pair including metadata, attachments, PII mappings
Save Memory (if enabled): Stores the conversation if considered meaningful

Required State Fields for Postprocessing:

Your pipeline state MUST include these fields for postprocessing to work:

Field

Type

Description

events

list[dict[str, Any]]

Events for thinking & activity tracking. Empty list if none.

related

list[str]

Related topics or concepts. Empty list if none.

response

str

The response to the user.

Most other required fields are already provided by preprocessing - just pass them through!

Example: Complete Pipeline with Preprocessing

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    previous_state: dict[str, Any] | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    """Build initial state using preprocessed data."""
    if not previous_state:
        raise ValueError("Previous state is required")
    
    return {
        # From preprocessing (pass through)
        "user_query": previous_state.get("user_query"),
        "standalone_query": previous_state.get("standalone_query"),
        "chat_history": previous_state.get("chat_history", []),
        "event_emitter": previous_state.get("event_emitter"),
        
        # Required for postprocessing
        "events": [],  # Add thinking events here if needed
        "related": [],  # Add related topics here if needed
        "response": previous_state.get("response"),
        
        # Your custom fields
        "custom_context": "value",
    }

async def build(
    pipeline_config: dict[str, Any],
    ...
) -> Pipeline:
    """Build the pipeline."""
    # Your pipeline implementation
    # Use standalone_query from state for retrieval
    # Use chat_history from state for context
    # ...

Benefits of This Architecture

✅ Less boilerplate: No need to implement chat history retrieval, attachment processing, etc.

✅ Automatic caching: Cache checking and saving handled for you

✅ Built-in PII protection: Anonymization handled if enabled

✅ Memory support: User memory retrieval and saving automatic

✅ Focus on logic: Spend time on your custom pipeline behavior, not infrastructure

Pipeline Config Resolver Utility

GLChat provides a PipelineConfigResolver utility class that simplifies access to common pipeline configurations like LLM models, embeddings, and other frequently used settings.

What is PipelineConfigResolver?

PipelineConfigResolver is a helper class that:

Provides easy access to common pipeline configurations
Handles default values automatically
Lazily initializes expensive resources (LM invoker, EM invoker, etc.)
Validates configuration values

Using PipelineConfigResolver

Import and initialize:

from glchat_be.component.utils.pipeline_config_resolver import PipelineConfigResolver

async def build(
    pipeline_config: dict[str, Any],
    ...
) -> Pipeline:
    # Create resolver
    config = PipelineConfigResolver(pipeline_config)
    
    # Access common configurations
    model_name = config.model_name              # LLM model name
    lm_invoker = config.lm_invoker             # Language model invoker
    em_invoker = config.em_invoker             # Embedding model invoker
    embeddings = config.langchain_embeddings   # Langchain embeddings
    
    # Use in your components
    # ...

Available Properties

Model Configuration:

model_name: LLM model name (e.g., "openai/gpt-4")
model_kwargs: Model-specific kwargs
model_env_kwargs: Environment-specific model kwargs
model_config: ModelConfig tuple (name, kwargs, env_kwargs)

Vectorizer/Embeddings:

vectorizer_config: VectorizerConfig (name, model, kwargs)
em_invoker: Embedding model invoker (BaseEMInvoker)
langchain_embeddings: Langchain-compatible embeddings (Embeddings)

Invokers:

lm_invoker: Language model invoker (BaseLMInvoker)
em_invoker: Embedding model invoker (BaseEMInvoker)

Pipeline Settings:

prompt_context_char_threshold: Character limit for prompt context
chat_history_limit: Maximum chat history messages
reference_formatter_batch_size: Batch size for reference formatting
reference_formatter_threshold: Threshold for reference formatter
strategy_batch_size: Batch size for generation strategy
generation_strategy: Generation strategy ("stuff" or "refine")
enable_guardrails: Whether guardrails are enabled
support_multimodal: Whether multimodal is supported

Retrievers and Rerankers:

kb_retriever: Knowledge base retriever (BasicVectorRetriever)
reranker: Reranker instance (BaseReranker | None)

Other:

modality_converter: Modality converter for multimodal (BaseModalityConverter | None)
rago_pipeline: RAGO pipeline name
pipeline_preset_id: Pipeline preset ID

Example: Using PipelineConfigResolver

from glchat_be.component.utils.pipeline_config_resolver import PipelineConfigResolver
from gllm_generation.response_synthesizer import ResponseSynthesizer
from gllm_pipeline.steps import step

async def build(
    pipeline_config: dict[str, Any],
    prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None = None,
    lmrp_catalogs: dict[str, BaseCatalog[Any]] | None = None,
) -> Pipeline:
    # Create resolver for easy config access
    config = PipelineConfigResolver(pipeline_config)
    
    # Access embeddings for retriever
    embeddings = config.langchain_embeddings
    kb_retriever = config.kb_retriever
    
    # Access reranker (if configured)
    reranker = config.reranker
    
    # Access model settings
    chat_history_limit = config.chat_history_limit
    
    # Create response synthesizer with LM invoker
    response_synthesizer = ResponseSynthesizer.from_stuff(
        model_id=config.model_name,
        prompt_builder_catalog=prompt_builder_catalogs.get("answer_generation")
    )
    
    # Build pipeline steps
    steps = [
        step(
            component=kb_retriever,
            input_map={
                "query": "user_query",
                "top_k": "normal_search_top_k",
            },
            output_state="retrieved_chunks"
        )
    ]
    
    # Add reranker step if available
    if reranker:
        steps.append(
            step(
                component=reranker,
                input_map={
                    "query": "user_query",
                    "chunks": "retrieved_chunks",
                },
                output_state="reranked_chunks"
            )
        )
    
    # Add response synthesis step
    steps.append(
        step(
            component=response_synthesizer,
            input_map={
                "query": "user_query",
                "context": "reranked_chunks" if reranker else "retrieved_chunks",
                "history": "chat_history",
            },
            output_state="response"
        )
    )
    
    return Pipeline(steps=steps)

Benefits of Using PipelineConfigResolver

✅ Cleaner code: No need to manually extract and validate config values

✅ Default values: Automatically handles missing configurations with sensible defaults

✅ Type safety: Returns properly typed objects (invokers, embeddings, etc.)

✅ Lazy loading: Expensive resources only initialized when accessed

✅ Consistency: Use the same configuration patterns across all pipelines

Direct Access vs PipelineConfigResolver

Without PipelineConfigResolver:

# Manual extraction and validation
model_name = pipeline_config.get("model_name", "openai/gpt-4")
model_kwargs = pipeline_config.get("model_kwargs", {})
model_env_kwargs = pipeline_config.get("model_env_kwargs", {})

# Manual invoker creation
lm_invoker = get_lm_invoker(model_name, model_kwargs, model_env_kwargs)

With PipelineConfigResolver:

# Clean and simple
config = PipelineConfigResolver(pipeline_config)
lm_invoker = config.lm_invoker  # Automatically handles all the above

Building Tools

Tools allow you to expose your entire pipeline as a callable component via API. This enables your pipeline to be used as a tool within agent workflows or called directly through the /components/{component_id}/run endpoint.

How Tools Work

When you implement build_tools(), your pipeline becomes accessible as a tool:

Your pipeline is converted to a tool using .as_tool()
Wrapped in a ToolProcessor for input/output processing
Exposed via API endpoint: POST /components/{chatbot_id}:{tool_name}/run

Creating Tools

Implement the build_tools() function in your pipeline.py:

from typing import Any
from glchat_plugin.pipeline.tool_processor import ToolProcessor
from gllm_inference.catalog.catalog import BaseCatalog
from pydantic import BaseModel, Field

from glchat_be.config.pipeline.general_pipeline_config import GeneralPipelineConfig


# Define input schema for your tool
class MyPipelineToolInput(BaseModel):
    """Input schema for my pipeline tool.
    
    Attributes:
        query (str): The user query.
        top_k (int): Number of results to return.
    """
    query: str = Field(..., description="The user query")
    top_k: int = Field(default=5, ge=1, le=20, description="Number of results")


# Create a ToolProcessor for preprocessing/postprocessing
class MyPipelineToolProcessor(ToolProcessor):
    """Tool processor for my pipeline.
    
    Attributes:
        tool (Tool): The pipeline tool. Inherited from ToolProcessor.
    """
    
    async def preprocess(
        self,
        pipeline_config: dict[str, Any],
        inputs: dict[str, Any],
        config: dict[str, Any],
        **kwargs: Any,
    ) -> tuple[dict[str, Any], dict[str, Any]]:
        """Preprocess inputs before pipeline execution.
        
        Args:
            pipeline_config: Pipeline configuration.
            inputs: Raw inputs from API request.
            config: Configuration for this tool invocation.
            **kwargs: Additional keyword arguments.
        
        Returns:
            Tuple of (processed_inputs, merged_config).
        """
        # Validate inputs using your schema
        validated_input = MyPipelineToolInput(**inputs)
        
        # Add any required fields
        if "standalone_query" not in inputs:
            inputs["standalone_query"] = inputs.get("query", "")
        
        # Merge with default config
        merged_config = {**pipeline_config, **config}
        
        return (validated_input.model_dump(), merged_config)
    
    async def postprocess(self, result: Any) -> dict[str, Any]:
        """Postprocess pipeline output before returning to caller.
        
        Args:
            result: Raw result from pipeline.
        
        Returns:
            Processed result dictionary.
        """
        return {
            "response": result.get("response", ""),
            "references": result.get("references", []),
        }


# Build tools function
async def build_tools(
    pipeline_config: dict[str, Any],
    prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None = None,
    lmrp_catalogs: dict[str, BaseCatalog[Any]] | None = None,
) -> list[ToolProcessor]:
    """Build tools for this pipeline.
    
    Args:
        pipeline_config: Pipeline configuration dictionary.
        prompt_builder_catalogs: Optional prompt builder catalogs.
        lmrp_catalogs: Optional LMRP catalogs.
    
    Returns:
        List of ToolProcessor instances.
    """
    # Build your pipeline
    pipeline = await build(pipeline_config, prompt_builder_catalogs, lmrp_catalogs)
    
    # Set input type and context schema
    pipeline._input_type = MyPipelineToolInput
    pipeline._context_schema = GeneralPipelineConfig
    
    # Convert pipeline to tool
    tool = pipeline.as_tool(description="My custom pipeline tool")
    
    # Wrap in tool processor
    return [MyPipelineToolProcessor(tool=tool)]

Tool API Endpoint

Once you implement build_tools(), your pipeline is automatically exposed via API:

Endpoint:

POST /components/{chatbot_id}:{tool_name}/run

Example Request:

curl -X POST "https://api.example.com/components/my-chatbot:my_pipeline_tool/run" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "inputs": {
      "query": "What is AI?",
    },
    "config": {
      "top_k": 10
    }
  }'

Response:

{
  "response": "AI is artificial intelligence...",
  "references": ["ref1", "ref2"]
}

Tool Components

1. Input Schema (MyPipelineToolInput)

Defines expected input fields
Uses Pydantic for validation
Documents each field with description

2. Tool Processor (MyPipelineToolProcessor)

preprocess(): Validates inputs, adds required fields, merges config
postprocess(): Formats output, extracts relevant fields

3. Build Tools Function

Creates pipeline using build()
Sets _input_type and _context_schema
Converts pipeline to tool with .as_tool()
Wraps in ToolProcessor
Returns list of tool processors

Key Requirements

Input Type:

Must be a Pydantic BaseModel
Set via pipeline._input_type
Validates API request inputs

Context Schema:

Typically GeneralPipelineConfig
Set via pipeline._context_schema
Provides runtime context (user_id, conversation_id, etc.)

ToolProcessor:

Must extend ToolProcessor base class
Implement preprocess() for input transformation
Implement postprocess() for output transformation
Wraps the tool for API exposure

Configuration Best Practices

When to Use Preset Config

Use preset configuration to define custom configuration fields via database migration:

Add new custom fields to the system
Set default values for your custom fields
Make fields configurable in Admin Dashboard UI
Store configuration at chatbot instance level

Example fields: custom_retrieval_threshold, enable_custom_reranking, custom_batch_size

Where accessible: build(), build_initial_state(), build_tools() only

When values are set: Configuration time (Admin Dashboard)

When to Use Runtime Config

Use runtime configuration to make PresetConfig fields accessible during pipeline execution:

Make custom fields accessible in pipeline steps (via input_map)
Allow per-request overrides of preset values
Enable dynamic parameter changes per API request
Control which fields are exposed at execution time

Example fields: Same as PresetConfig fields you want accessible during execution

Where accessible: Everywhere - build(), build_initial_state(), build_tools(), AND pipeline execution steps

When values are set:

Defaults from PresetConfig (configuration time)
Can be overridden per request (execution time)

Important Rule:

PresetConfig only: Field accessible only during build time
PresetConfig + RuntimeConfig: Field accessible during build time AND execution time
RuntimeConfig acts as a "gate" - only fields defined in RuntimeConfig are available in execution steps

Comparison Table

Aspect

PresetConfig

RuntimeConfig

GeneralPipelineConfig

When values are set

Configuration time (building chatbot)

Execution time (per request)

Accessible in build()

✅ Yes (via pipeline_config)

❌ No (use in build_initial_state)

Accessible in execution (steps)

❌ No (unless also in RuntimeConfig)

✅ Yes (via input_map)

Purpose

Define custom config fields with defaults

Make fields accessible during execution + allow overrides

System-wide common config

Source

Database migration + class definition

Extracts from pipeline_config

Provided by system

When to create

When adding custom fields

When you need fields accessible in execution

Always available (no need to create)

Typical fields

Custom thresholds, custom features

Same fields as PresetConfig (for execution access)

User context, chat history, system settings

Example

custom_threshold: float = 0.7

custom_threshold: float = 0.7 (makes it accessible in steps)

user_id: str, conversation_id: str

Key Differences:

PresetConfig: Values accessible only in build(), build_initial_state(), build_tools()
RuntimeConfig: Makes PresetConfig values accessible during execution + allows per-request overrides
GeneralPipelineConfig: System-wide fields always accessible during execution
To use a PresetConfig field in steps: You MUST also define it in RuntimeConfig

Examples

Example: Custom RAG Pipeline with PresetConfig + RuntimeConfig

Scenario: You're adding custom fields for retrieval and reranking configuration, and you want them accessible during pipeline execution.

Step 1: Add to migration

custom_config_fields = {
    "custom_retrieval_top_k": {
        "type": "int",
        "default_value": "5",
        "ui_type": "number",
        "level": "PRESET_CHATBOT",
        "category_id": retrieval_cat_id,
    },
    "enable_custom_reranking": {
        "type": "bool",
        "default_value": "true",
        "ui_type": "toggle",
        "level": "PRESET_CHATBOT",
        "category_id": retrieval_cat_id,
    },
}

Step 2: Create PresetConfig

# preset_config.py
class MyRAGPresetConfig(BasePipelinePresetConfig):
    """Custom RAG pipeline preset configuration.
    
    Attributes:
        custom_retrieval_top_k (int): Number of documents to retrieve.
        enable_custom_reranking (bool): Whether to enable custom reranking.
    """
    
    custom_retrieval_top_k: int = Field(default=5, ge=1, le=20)
    enable_custom_reranking: bool = True

Step 3: Create RuntimeConfig (REQUIRED for execution access)

# runtime_config.py
class MyRAGRuntimeConfig(BaseModel):
    """Runtime configuration - makes PresetConfig fields accessible in execution.
    
    Attributes:
        custom_retrieval_top_k (int): Number of documents to retrieve (can override).
        enable_custom_reranking (bool): Whether to enable custom reranking (can override).
    """
    
    # Must match PresetConfig fields you want accessible in steps
    custom_retrieval_top_k: int = Field(default=5, ge=1, le=20)
    enable_custom_reranking: bool = True

Step 4: Use in pipeline

# pipeline.py
preset_config_class = MyRAGPresetConfig
additional_config_class = MyRAGRuntimeConfig  # Required!

async def build(pipeline_config: dict[str, Any], ...) -> Pipeline:
    # Access for conditional logic during build
    enable_reranking = pipeline_config.get("enable_custom_reranking", True)
    
    # Create components
    retriever = create_retriever()
    reranker = create_reranker() if enable_reranking else None
    
    return Pipeline(steps=[
        step(
            component=retriever,
            input_map={
                "query": "user_query",
                # Accessible because defined in RuntimeConfig!
                "top_k": "custom_retrieval_top_k",
            },
            output_state="results"
        )
    ])

Why both are needed:

PresetConfig: Defines fields in DB + Admin Dashboard UI + provides defaults
RuntimeConfig: Makes fields accessible in execution steps + allows overrides

Example: Pipeline with RuntimeConfig

Scenario: You want to allow runtime overrides for your custom preset values.

# runtime_config.py
class MyRAGRuntimeConfig(BaseModel):
    """Runtime configuration for My RAG pipeline.
    
    These fields can be overridden at execution time (per API request),
    even if they were set as presets in the Admin Dashboard.
    
    Attributes:
        custom_retrieval_top_k (int): Number of documents to retrieve (can override preset).
        enable_custom_reranking (bool): Whether to enable reranking (can override preset).
        use_cache (bool): Whether to use cached responses.
    """
    
    custom_retrieval_top_k: int = Field(default=5, ge=1, le=20)
    enable_custom_reranking: bool = True
    use_cache: bool = True


class MyRAGRuntimeConfigKeys(StrEnum):
    """Keys for runtime config."""
    
    CUSTOM_RETRIEVAL_TOP_K = "custom_retrieval_top_k"
    ENABLE_CUSTOM_RERANKING = "enable_custom_reranking"
    USE_CACHE = "use_cache"

# pipeline.py
additional_config_class = MyRAGRuntimeConfig

async def build(pipeline_config: dict[str, Any], ...) -> Pipeline:
    # Access for conditional logic during build
    enable_reranking = pipeline_config.get("enable_custom_reranking", True)
    
    # Create components
    retriever = create_retriever()
    
    return Pipeline(steps=[
        step(
            component=retriever,
            input_map={
                "query": "user_query",
                # These values come from PresetConfig by default,
                # but can be overridden at runtime via API request (RuntimeConfig)
                "top_k": "custom_retrieval_top_k",
                "use_cache": "use_cache",
            },
            output_state="results"
        )
    ])

How it works:

User sets custom_retrieval_top_k = 10 in Admin Dashboard (PresetConfig)
By default, pipeline uses 10 when executed
User can override to custom_retrieval_top_k = 5 in API request (RuntimeConfig)
For that specific request, the component receives 5 via input_map
Next request without override uses 10 again (back to preset)

Example: Using GeneralPipelineConfig

Scenario: You need access to user context and chat history.

# pipeline.py
from glchat_be.config.pipeline.general_pipeline_config import GeneralPipelineConfig

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    previous_state: dict[str, Any] | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    # Extract general config
    general_config = GeneralPipelineConfig.from_request_config(request)
    
    return {
        "user_query": previous_state.get("user_query"),
        "user_id": general_config.user_id,
        "conversation_id": general_config.conversation_id,
        "chat_history": general_config.chat_history,
        "knowledge_base_id": general_config.knowledge_base_id,
        "event_emitter": previous_state.get("event_emitter"),
    }

Example: PresetConfig vs RuntimeConfig Flow

Complete workflow showing how preset and runtime values work together:

Step 1: Define PresetConfig (values set at configuration time)

# preset_config.py
class MyRAGPresetConfig(BasePipelinePresetConfig):
    """Preset configuration set in Admin Dashboard."""
    
    retrieval_top_k: int = Field(default=10, ge=1, le=50)
    enable_reranking: bool = True

Step 2: Define RuntimeConfig (same fields, can be overridden at runtime)

# runtime_config.py
class MyRAGRuntimeConfig(BaseModel):
    """Runtime configuration for per-request overrides."""
    
    retrieval_top_k: int = Field(default=10, ge=1, le=50)
    enable_reranking: bool = True

Step 3: Use in pipeline

# pipeline.py
preset_config_class = MyRAGPresetConfig
additional_config_class = MyRAGRuntimeConfig

async def build(pipeline_config: dict[str, Any], ...) -> Pipeline:
    # Create components
    retriever = create_retriever()
    
    return Pipeline(steps=[
        step(
            component=retriever,
            input_map={
                "query": "user_query",
                # Access retrieval_top_k from merged state
                # Value comes from PresetConfig by default
                # Can be overridden by RuntimeConfig in API request
                "top_k": "retrieval_top_k",
            },
            output_state="results"
        )
    ])

Execution Flow:

Admin Dashboard: User sets retrieval_top_k = 20 (saved as preset)
API Request 1 (no override): Component receives retrieval_top_k = 20 via input_map
API Request 2 (with override retrieval_top_k = 5): Component receives 5 via input_map
API Request 3 (no override): Component receives 20 again via input_map

Understanding the Merged State

The merged state is the core mechanism for passing data through your pipeline. Here's how it works:

Stage 1: Build Time (build() function)

Access pipeline_config for conditional logic and component creation
Use pipeline_config.get("key") to read preset/runtime values
Create pipeline steps with input_map

Stage 2: Execution Time (pipeline running)

System automatically merges three sources into one state dict:
1. Values from build_initial_state() return
2. Runtime config values (PresetConfig + RuntimeConfig overrides)
3. General config values (GeneralPipelineConfig)
Components receive values via input_map keys
All sources are accessible through the same key namespace

Visual Example:

# build_initial_state() returns:
{
    "user_query": "What is AI?",
    "event_emitter": <emitter>,
    "custom_field": "custom_value"
}

# Runtime config (ONLY fields defined in RuntimeConfig class):
# Note: Values come from PresetConfig defaults, can be overridden per request
{
    "retrieval_top_k": 20,        # From PresetConfig, accessible because also in RuntimeConfig
    "enable_reranking": True,     # From PresetConfig, accessible because also in RuntimeConfig
    "use_cache": True             # From PresetConfig, accessible because also in RuntimeConfig
}
# If a field is ONLY in PresetConfig but NOT in RuntimeConfig, it won't be here!

# General config contains:
{
    "user_id": "user123",
    "conversation_id": "conv456",
    "knowledge_base_id": "kb789",
    # ... many more fields
}

# MERGED STATE (all combined, accessible via input_map):
{
    "user_query": "What is AI?",
    "event_emitter": <emitter>,
    "custom_field": "custom_value",
    "retrieval_top_k": 20,
    "enable_reranking": True,
    "use_cache": True,
    "user_id": "user123",
    "conversation_id": "conv456",
    "knowledge_base_id": "kb789",
    # ... all fields from all sources
}

# Access in step via input_map:
step(
    component=retriever,
    input_map={
        "query": "user_query",            # From state
        "top_k": "retrieval_top_k",       # From runtime config
        "kb_id": "knowledge_base_id",     # From general config
        "custom": "custom_field",          # From state
    }
)

Key Takeaway: Use input_map to access any value from the merged state - you don't need to know which source it came from!

Critical Rule: Only fields defined in RuntimeConfig (or returned from build_initial_state(), or from GeneralPipelineConfig) are available in the merged state. PresetConfig fields are NOT automatically merged - you must also define them in RuntimeConfig to make them accessible during execution.

PreviousExamples NextAgentic Pipeline

Last updated 1 month ago

hashtagOptional Module Attributes

hashtagPreset Configuration

hashtagCreating Preset Config

hashtagRegistering Preset Config

hashtagAccessing Preset Config

hashtagPreset Config vs Migration Fields

hashtagRuntime Configuration

hashtagCreating Runtime Config

hashtagRelationship with PresetConfig:

hashtagRegistering Runtime Config

hashtagAccessing Runtime Config (and PresetConfig values during execution)

hashtagGeneral Pipeline Config (Available to All Pipelines)

hashtagPipeline Execution Flow

hashtagStages Before Your Pipeline

hashtagYour Pipeline

hashtagStages After Your Pipeline

hashtagPipeline Config Resolver Utility

hashtagWhat is PipelineConfigResolver?

hashtagUsing PipelineConfigResolver

hashtagBenefits of Using PipelineConfigResolver

hashtagBuilding Tools

hashtagHow Tools Work

hashtagCreating Tools

hashtagTool API Endpoint

hashtagTool Components

hashtagConfiguration Best Practices

hashtagWhen to Use Preset Config

hashtagWhen to Use Runtime Config

hashtagComparison Table

hashtagExamples

hashtagExample: Custom RAG Pipeline with PresetConfig + RuntimeConfig

hashtagExample: Pipeline with RuntimeConfig

hashtagExample: Using GeneralPipelineConfig

hashtagExample: PresetConfig vs RuntimeConfig Flow

hashtagUnderstanding the Merged State