Function Parameters Reference

Comprehensive reference for all parameters in the required pipeline functions.

`build()` Function Parameters

Function Signature

async def build(
    pipeline_config: dict[str, Any],
    prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None = None,
    lmrp_catalogs: dict[str, BaseCatalog[Any]] | None = None,
) -> Pipeline:

`pipeline_config: dict[str, Any]`

Pipeline configuration including model settings and pipeline-specific configurations (preset + runtime config).

Model Configuration

Field

Type

Description

model_name

str

Language model identifier (e.g., "openai/gpt-4", "gemini-pro")

model_kwargs

dict

Model parameters (temperature, max_tokens, etc.)

model_env_kwargs

dict

Environment settings (API keys, endpoints)

vectorizer_kwargs

dict

Embedding model configuration and parameters

Retrieval Settings

Field

Type

Description

normal_search_top_k

int

Number of chunks for normal search (≥ 1)

smart_search_top_k

int

Number of chunks for smart search (≥ 1)

web_search_top_k

int

Number of chunks for web search (≥ 1)

vector_weight

float

Weight for vector similarity in hybrid search (0.0-1.0)

rerank_type

str

Type of reranking method to apply

rerank_kwargs

str

Reranking parameters (JSON string)

enable_mmr

bool

Enable Maximal Marginal Relevance reranking

fetch_k

int

Number of candidates to fetch for MMR

lambda_mult

float

MMR diversity parameter (0=max diversity, 1=max relevance, 0.0-1.0)

Pipeline Behavior

Field

Type

Description

augment_context

bool

Whether to augment context from knowledge base

use_model_knowledge

bool

Whether to allow model to use its built-in knowledge

use_cache

bool

Whether to check and use cached responses

chat_history_limit

int

Maximum number of history messages to include

prompt_context_char_threshold

int

Character limit for context in prompts

support_multimodal

bool

Whether the pipeline supports multimodal inputs

Response Formatting

Field

Type

Description

reference_formatter_type

str

Type of reference formatter ("lm", "none")

reference_formatter_threshold

float

Threshold for reference formatting (0.0-1.0)

reference_formatter_batch_size

int

Batch size for reference formatting (≥ 1)

generation_strategy

str

Response generation strategy ("stuff" or "refine")

strategy_batch_size

int

Batch size for generation strategy (≥ 1)

repacker_method

str

Chunk ordering method ("forward", "reverse", "sides")

Advanced Features

Field

Type

Description

enable_personalization

bool

Whether to enable response personalization

enable_guardrails

bool

Whether to enable content guardrails

guardrail_mode

str

Guardrail mode configuration

anonymize_em

bool

Whether to anonymize before using embedding model

anonymize_lm

bool

Whether to anonymize before using language model

Usage Example

async def build(pipeline_config: dict[str, Any], ...) -> Pipeline:
    # Access model configuration
    model_name = pipeline_config.get("model_name", "gpt-4")
    model_kwargs = pipeline_config.get("model_kwargs", {})
    
    # Access retrieval settings
    top_k = pipeline_config.get("normal_search_top_k", 5)
    enable_reranking = pipeline_config.get("enable_mmr", False)
    
    # Access pipeline behavior
    augment_context = pipeline_config.get("augment_context", True)
    chat_history_limit = pipeline_config.get("chat_history_limit", 10)
    
    # Use in your pipeline logic
    # ...

`prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None`

Dictionary mapping catalog identifiers to BaseCatalog instances containing prompt builder configurations.

Structure

Key: Catalog identifier (e.g., "standard_rag", "no_op", scope name)
Value: BaseCatalog[Any] instance containing prompt templates and configurations

Purpose

Provides access to prompt templates for different use cases
Allows dynamic prompt selection based on pipeline scope or model
Enables prompt customization without code changes

When to Use

When building response synthesizers or generation components
When you need to construct prompts for language model calls
When implementing custom prompt logic

Usage Example

from glchat_be.config.pipeline.pipeline_helper import get_prompt_builder_by_scope

async def build(
    pipeline_config: dict[str, Any],
    prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None = None,
    ...
) -> Pipeline:
    prompt_builder_catalogs = prompt_builder_catalogs or {}
    
    # Get prompt builder for your pipeline scope
    prompt_builder = get_prompt_builder_by_scope(
        prompt_builder_catalogs, 
        scope="your_pipeline_scope",  # Your pipeline identifier
        model_name=model_name
    )
    
    # Use with response synthesizer
    response_synthesizer = StuffResponseSynthesizer.from_lm_components(
        prompt_builder,
        lm_invoker
    )

`lmrp_catalogs: dict[str, BaseCatalog[Any]] | None`

Dictionary mapping catalog identifiers to BaseCatalog instances containing Language Model Request Processor (LMRP) configurations.

Structure

Key: Catalog identifier (e.g., prompt name, use case identifier)
Value: BaseCatalog[Any] instance containing LMRP configurations

Purpose

Provides access to LMRP configurations for query transformation
Enables query rewriting, expansion, or refinement
Supports different processing strategies for different use cases

When to Use

When building retrieval pipelines that need query transformation
When implementing query rewriting or expansion steps
When you need to process queries before retrieval or generation

Usage Example

from glchat_be.config.pipeline.pipeline_helper import get_lmrp_by_scope

async def build(
    pipeline_config: dict[str, Any],
    lmrp_catalogs: dict[str, BaseCatalog[Any]] | None = None,
    ...
) -> Pipeline:
    lmrp_catalogs = lmrp_catalogs or {}
    
    # Get LMRP for query transformation
    lmrp = get_lmrp_by_scope(
        lmrp_catalogs,
        scope="build_standalone_query"  # Use case name
    )
    
    # Use in query transformation step
    # ...

`build_initial_state()` Function Parameters

Function Signature

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    previous_state: dict[str, Any] | None = None,
    **kwargs: Any,
) -> dict[str, Any]:

`request: dict[str, Any]`

User request data and conversation context. This is the raw input from the API/UI.

Message Content

Field

Type

Description

message

str

User's input message/text query

original_message

str

Original message without any preprocessing

binaries

list

Binary data objects (images, files, etc.)

user_multimodal_contents

list

Multimodal content from the user

Conversation Context

Field

Type

Description

conversation_id

str

Identifier for the conversation thread

user_id

str

Identifier for the user making the request

parent_id

str

ID of the parent message (for threaded conversations)

user_message_id

str

Unique identifier for this user message

assistant_message_id

str

Unique identifier for the assistant's response

chat_history

list[Message]

Previous messages in the conversation

last_message_id

str

ID of the last message in the conversation

Knowledge Base & Retrieval

Field

Type

Description

knowledge_base_id

str

Identifier for the knowledge base to retrieve from

search_type

str

Type of search ("normal", "smart", "hybrid", "search")

connectors

list[str]

List of connector identifiers to use

connector_user_token

str

Authentication token for connectors

filters

list[dict]

Metadata filters for retrieval

Attachments

Field

Type

Description

attachments

dict

User-provided file attachments

attachment_chunk_size

int

Size for chunking attachments

use_docproc

bool

Whether to use document processing orchestrator

model_supported_attachments

dict

Model-specific attachment configurations

Metadata

Field

Type

Description

source

str

Source of the request (e.g., "web", "api", "mobile")

lang_id

str

Language identifier (e.g., "en", "id")

start_time

float

Timestamp when the request was initiated

quote

str

Quoted message text if replying to a specific message

client_type

str

Client type making the request

Request Flags

Field

Type

Description

is_regenerated

bool

Whether the message is regenerated

is_edited

bool

Whether the message is edited

is_retried

bool

Whether the message is retried

Advanced

Field

Type

Description

hyperparameters

dict

Dynamic hyperparameters

personalization_details

dict

User personalization data

field_of_interests

list[str]

User's fields of interest

Usage Example

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    ...
) -> dict[str, Any]:
    # Extract user request data
    user_query = request.get("message", "")
    conversation_id = request.get("conversation_id", "")
    user_id = request.get("user_id", "")
    
    # Extract knowledge base info
    knowledge_base_id = request.get("knowledge_base_id", "")
    search_type = request.get("search_type", "normal")
    
    # Use in state initialization
    # ...

`pipeline_config: dict[str, Any]`

Same as in the build() function. Contains pipeline configuration including model settings and pipeline-specific configurations (preset + runtime config).

See the build() function's pipeline_config parameter for detailed field descriptions.

Usage in build_initial_state()

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    ...
) -> dict[str, Any]:
    # Use helper to extract retrieval params
    retrieval_params = get_retrieval_params(
        pipeline_config, 
        request, 
        is_knowledge_base=True
    )

    # Access model configuration
    model_name = pipeline_config.get("model_name", "gpt-4")
    chat_history_limit = pipeline_config.get("chat_history_limit", 10)
    
    # Use in state initialization
    # ...

`previous_state: dict[str, Any] | None`

The state from the previous pipeline stage (typically from preprocessing). This allows your pipeline to access data that was already processed.

Common Fields from Preprocessing

Query Processing

Field

Type

Description

user_query

str

The processed user query

masked_user_query

str

Query with PII masked/anonymized

generation_query

str

Query prepared for generation

standalone_query

str

Standalone query extracted from conversation context

retrieval_query

str

Query prepared for retrieval

History & Context

Field

Type

Description

history

list[Message]

Conversation history as Message objects

augmented_history

list[Message]

History with additional context

transformed_history

list[Message]

History after transformations

Event & State Management

Field

Type

Description

event_emitter

EventEmitter

Event emitter for pipeline events

events

list[dict]

List of events emitted so far

cache_hit

bool

Whether the response came from cache

Anonymization

Field

Type

Description

anonymized_mappings

list[AnonymizerMapping]

PII anonymization mappings

Media

Field

Type

Description

media_mapping

dict

Mapping of media references

chat_history_media_mapping

dict

Media mappings from chat history

Memory

Field

Type

Description

memory_identifier

dict

Identifiers for memory retrieval

memory_results

list[Chunk]

Retrieved memory chunks

Other

Field

Type

Description

steps

list[dict]

Steps taken so far in processing

extra_contents

list[MessageContent]

Additional content to pass to model

response

str

Current response (may be empty initially)

Usage Example

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    previous_state: dict[str, Any] | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    if previous_state is None:
        raise ValueError("previous_state is required")
    
    # Extract data from previous state
    user_query = previous_state.get("user_query", "")
    event_emitter = previous_state.get("event_emitter")
    history = previous_state.get("history", [])
    cache_hit = previous_state.get("cache_hit", False)
    
    # Use in your state initialization
    return {
        "user_query": user_query,
        "history": history,
        "event_emitter": event_emitter,
        "response": "",
        # ... other fields
    }

`**kwargs: Any`

Additional keyword arguments that may be passed by the pipeline execution framework. These are context-specific and may vary.

Common kwargs Fields

Field

Type

Description

event_emitter

EventEmitter

Event emitter instance (if not in previous_state)

last_message_id

str

ID of the last message in the conversation

organization_id

str

Organization identifier for multi-tenancy

Usage Example

def build_initial_state(
    request: dict[str, Any],
    pipeline_config: dict[str, Any],
    previous_state: dict[str, Any] | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    # Extract from kwargs if needed
    event_emitter = kwargs.get("event_emitter")
    last_message_id = kwargs.get("last_message_id", "")
    organization_id = kwargs.get("organization_id", "")
    
    # Use in state initialization
    # ...

Helper Functions

`get_retrieval_params()`

Extracts retrieval parameters from pipeline config and request.

from glchat_be.config.pipeline.pipeline_helper import get_retrieval_params

retrieval_params = get_retrieval_params(
    pipeline_config=pipeline_config,
    request_config=request,
    is_knowledge_base=True,  # or False for conversation retrieval
    is_attachment=False      # or True for attachment retrieval
)

`get_prompt_builder_by_scope()`

Gets prompt builder for a specific scope.

from glchat_be.config.pipeline.pipeline_helper import get_prompt_builder_by_scope

prompt_builder = get_prompt_builder_by_scope(
    prompt_builder_catalogs,
    scope="your_pipeline_name",
    model_name=model_name
)

`get_lmrp_by_scope()`

Gets LMRP (Language Model Request Processor) for a specific scope.

from glchat_be.config.pipeline.pipeline_helper import get_lmrp_by_scope

lmrp = get_lmrp_by_scope(
    lmrp_catalogs,
    scope="build_standalone_query"
)

`get_lm_invoker()`

Creates a language model invoker.

from glchat_be.utils.initializer import get_lm_invoker

lm_invoker = get_lm_invoker(
    model_name="openai/gpt-4",
    model_kwargs={"temperature": 0.7},
    model_env_kwargs={}
)

`get_embeddings()`

Creates an embeddings model.

from glchat_be.utils.initializer import get_embeddings

embeddings = get_embeddings(
    model_name,
    vectorizer_kwargs
)

Quick Reference

Most Commonly Used Fields

From pipeline_config:

model_name - Which model to use
model_kwargs - Model parameters
normal_search_top_k / smart_search_top_k - Retrieval count
augment_context - Enable/disable retrieval
chat_history_limit - History limit

From request:

message - User's query
conversation_id - Conversation ID
user_id - User ID
knowledge_base_id - KB to search
search_type - Search method

From previous_state:

user_query - Processed query
event_emitter - For streaming
history - Chat history
cache_hit - Cache status

PreviousDatabase Migration NextExamples

Last updated 1 month ago

hashtagbuild() Function Parameters

hashtagFunction Signature

hashtagpipeline_config: dict[str, Any]

hashtagModel Configuration

hashtagRetrieval Settings

hashtagPipeline Behavior

hashtagResponse Formatting

hashtagAdvanced Features

hashtagprompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None

hashtaglmrp_catalogs: dict[str, BaseCatalog[Any]] | None

hashtagbuild_initial_state() Function Parameters

hashtagFunction Signature

hashtagrequest: dict[str, Any]

hashtagpipeline_config: dict[str, Any]

hashtagprevious_state: dict[str, Any] | None

hashtag**kwargs: Any

hashtagHelper Functions

hashtagget_retrieval_params()

hashtagget_prompt_builder_by_scope()

hashtagget_lmrp_by_scope()

hashtagget_lm_invoker()

hashtagget_embeddings()

hashtagQuick Reference

hashtagMost Commonly Used Fields

`build()` Function Parameters

Function Signature

`pipeline_config: dict[str, Any]`

Model Configuration

Retrieval Settings

Pipeline Behavior

Response Formatting

Advanced Features

`prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None`

`lmrp_catalogs: dict[str, BaseCatalog[Any]] | None`

`build_initial_state()` Function Parameters

Function Signature

`request: dict[str, Any]`

`pipeline_config: dict[str, Any]`

`previous_state: dict[str, Any] | None`

`**kwargs: Any`

Helper Functions

`get_retrieval_params()`

`get_prompt_builder_by_scope()`

`get_lmrp_by_scope()`

`get_lm_invoker()`

`get_embeddings()`

Quick Reference

Most Commonly Used Fields