Function Parameters Reference

Comprehensive reference for all parameters in the required pipeline functions.


build() Function Parameters

Function Signature

async def build(
    pipeline_config: dict[str, Any],
    prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None = None,
    lmrp_catalogs: dict[str, BaseCatalog[Any]] | None = None,
) -> Pipeline:

pipeline_config: dict[str, Any]

Pipeline configuration including model settings and pipeline-specific configurations (preset + runtime config).

Model Configuration

Field
Type
Description

model_name

str

Language model identifier (e.g., "openai/gpt-4", "gemini-pro")

model_kwargs

dict

Model parameters (temperature, max_tokens, etc.)

model_env_kwargs

dict

Environment settings (API keys, endpoints)

vectorizer_kwargs

dict

Embedding model configuration and parameters

Retrieval Settings

Field
Type
Description

normal_search_top_k

int

Number of chunks for normal search (≥ 1)

smart_search_top_k

int

Number of chunks for smart search (≥ 1)

web_search_top_k

int

Number of chunks for web search (≥ 1)

vector_weight

float

Weight for vector similarity in hybrid search (0.0-1.0)

rerank_type

str

Type of reranking method to apply

rerank_kwargs

str

Reranking parameters (JSON string)

enable_mmr

bool

Enable Maximal Marginal Relevance reranking

fetch_k

int

Number of candidates to fetch for MMR

lambda_mult

float

MMR diversity parameter (0=max diversity, 1=max relevance, 0.0-1.0)

Pipeline Behavior

Field
Type
Description

augment_context

bool

Whether to augment context from knowledge base

use_model_knowledge

bool

Whether to allow model to use its built-in knowledge

use_cache

bool

Whether to check and use cached responses

chat_history_limit

int

Maximum number of history messages to include

prompt_context_char_threshold

int

Character limit for context in prompts

support_multimodal

bool

Whether the pipeline supports multimodal inputs

Response Formatting

Field
Type
Description

reference_formatter_type

str

Type of reference formatter ("lm", "none")

reference_formatter_threshold

float

Threshold for reference formatting (0.0-1.0)

reference_formatter_batch_size

int

Batch size for reference formatting (≥ 1)

generation_strategy

str

Response generation strategy ("stuff" or "refine")

strategy_batch_size

int

Batch size for generation strategy (≥ 1)

repacker_method

str

Chunk ordering method ("forward", "reverse", "sides")

Advanced Features

Field
Type
Description

enable_personalization

bool

Whether to enable response personalization

enable_guardrails

bool

Whether to enable content guardrails

guardrail_mode

str

Guardrail mode configuration

anonymize_em

bool

Whether to anonymize before using embedding model

anonymize_lm

bool

Whether to anonymize before using language model

Usage Example


prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None

Dictionary mapping catalog identifiers to BaseCatalog instances containing prompt builder configurations.

Structure

  • Key: Catalog identifier (e.g., "standard_rag", "no_op", scope name)

  • Value: BaseCatalog[Any] instance containing prompt templates and configurations

Purpose

  • Provides access to prompt templates for different use cases

  • Allows dynamic prompt selection based on pipeline scope or model

  • Enables prompt customization without code changes

When to Use

  • When building response synthesizers or generation components

  • When you need to construct prompts for language model calls

  • When implementing custom prompt logic

Usage Example


lmrp_catalogs: dict[str, BaseCatalog[Any]] | None

Dictionary mapping catalog identifiers to BaseCatalog instances containing Language Model Request Processor (LMRP) configurations.

Structure

  • Key: Catalog identifier (e.g., prompt name, use case identifier)

  • Value: BaseCatalog[Any] instance containing LMRP configurations

Purpose

  • Provides access to LMRP configurations for query transformation

  • Enables query rewriting, expansion, or refinement

  • Supports different processing strategies for different use cases

When to Use

  • When building retrieval pipelines that need query transformation

  • When implementing query rewriting or expansion steps

  • When you need to process queries before retrieval or generation

Usage Example


build_initial_state() Function Parameters

Function Signature


request: dict[str, Any]

User request data and conversation context. This is the raw input from the API/UI.

Message Content

Field
Type
Description

message

str

User's input message/text query

original_message

str

Original message without any preprocessing

binaries

list

Binary data objects (images, files, etc.)

user_multimodal_contents

list

Multimodal content from the user

Conversation Context

Field
Type
Description

conversation_id

str

Identifier for the conversation thread

user_id

str

Identifier for the user making the request

parent_id

str

ID of the parent message (for threaded conversations)

user_message_id

str

Unique identifier for this user message

assistant_message_id

str

Unique identifier for the assistant's response

chat_history

list[Message]

Previous messages in the conversation

last_message_id

str

ID of the last message in the conversation

Knowledge Base & Retrieval

Field
Type
Description

knowledge_base_id

str

Identifier for the knowledge base to retrieve from

search_type

str

Type of search ("normal", "smart", "hybrid", "search")

connectors

list[str]

List of connector identifiers to use

connector_user_token

str

Authentication token for connectors

filters

list[dict]

Metadata filters for retrieval

Attachments

Field
Type
Description

attachments

dict

User-provided file attachments

attachment_chunk_size

int

Size for chunking attachments

use_docproc

bool

Whether to use document processing orchestrator

model_supported_attachments

dict

Model-specific attachment configurations

Metadata

Field
Type
Description

source

str

Source of the request (e.g., "web", "api", "mobile")

lang_id

str

Language identifier (e.g., "en", "id")

start_time

float

Timestamp when the request was initiated

quote

str

Quoted message text if replying to a specific message

client_type

str

Client type making the request

Request Flags

Field
Type
Description

is_regenerated

bool

Whether the message is regenerated

is_edited

bool

Whether the message is edited

is_retried

bool

Whether the message is retried

Advanced

Field
Type
Description

hyperparameters

dict

Dynamic hyperparameters

personalization_details

dict

User personalization data

field_of_interests

list[str]

User's fields of interest

Usage Example


pipeline_config: dict[str, Any]

Same as in the build() function. Contains pipeline configuration including model settings and pipeline-specific configurations (preset + runtime config).

See the build() function's pipeline_config parameter for detailed field descriptions.

Usage in build_initial_state()


previous_state: dict[str, Any] | None

The state from the previous pipeline stage (typically from preprocessing). This allows your pipeline to access data that was already processed.

Common Fields from Preprocessing

Query Processing

Field
Type
Description

user_query

str

The processed user query

masked_user_query

str

Query with PII masked/anonymized

generation_query

str

Query prepared for generation

standalone_query

str

Standalone query extracted from conversation context

retrieval_query

str

Query prepared for retrieval

History & Context

Field
Type
Description

history

list[Message]

Conversation history as Message objects

augmented_history

list[Message]

History with additional context

transformed_history

list[Message]

History after transformations

Event & State Management

Field
Type
Description

event_emitter

EventEmitter

Event emitter for pipeline events

events

list[dict]

List of events emitted so far

cache_hit

bool

Whether the response came from cache

Anonymization

Field
Type
Description

anonymized_mappings

list[AnonymizerMapping]

PII anonymization mappings

Media

Field
Type
Description

media_mapping

dict

Mapping of media references

chat_history_media_mapping

dict

Media mappings from chat history

Memory

Field
Type
Description

memory_identifier

dict

Identifiers for memory retrieval

memory_results

list[Chunk]

Retrieved memory chunks

Other

Field
Type
Description

steps

list[dict]

Steps taken so far in processing

extra_contents

list[MessageContent]

Additional content to pass to model

response

str

Current response (may be empty initially)

Usage Example


**kwargs: Any

Additional keyword arguments that may be passed by the pipeline execution framework. These are context-specific and may vary.

Common kwargs Fields

Field
Type
Description

event_emitter

EventEmitter

Event emitter instance (if not in previous_state)

last_message_id

str

ID of the last message in the conversation

organization_id

str

Organization identifier for multi-tenancy

Usage Example


Helper Functions

get_retrieval_params()

Extracts retrieval parameters from pipeline config and request.

get_prompt_builder_by_scope()

Gets prompt builder for a specific scope.

get_lmrp_by_scope()

Gets LMRP (Language Model Request Processor) for a specific scope.

get_lm_invoker()

Creates a language model invoker.

get_embeddings()

Creates an embeddings model.


Quick Reference

Most Commonly Used Fields

From pipeline_config:

  • model_name - Which model to use

  • model_kwargs - Model parameters

  • normal_search_top_k / smart_search_top_k - Retrieval count

  • augment_context - Enable/disable retrieval

  • chat_history_limit - History limit

From request:

  • message - User's query

  • conversation_id - Conversation ID

  • user_id - User ID

  • knowledge_base_id - KB to search

  • search_type - Search method

From previous_state:

  • user_query - Processed query

  • event_emitter - For streaming

  • history - Chat history

  • cache_hit - Cache status

Last updated