Function Parameters Reference
Comprehensive reference for all parameters in the required pipeline functions.
build() Function Parameters
build() Function ParametersFunction Signature
async def build(
pipeline_config: dict[str, Any],
prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None = None,
lmrp_catalogs: dict[str, BaseCatalog[Any]] | None = None,
) -> Pipeline:pipeline_config: dict[str, Any]
pipeline_config: dict[str, Any]Pipeline configuration including model settings and pipeline-specific configurations (preset + runtime config).
Model Configuration
model_name
str
Language model identifier (e.g., "openai/gpt-4", "gemini-pro")
model_kwargs
dict
Model parameters (temperature, max_tokens, etc.)
model_env_kwargs
dict
Environment settings (API keys, endpoints)
vectorizer_kwargs
dict
Embedding model configuration and parameters
Retrieval Settings
normal_search_top_k
int
Number of chunks for normal search (≥ 1)
smart_search_top_k
int
Number of chunks for smart search (≥ 1)
web_search_top_k
int
Number of chunks for web search (≥ 1)
vector_weight
float
Weight for vector similarity in hybrid search (0.0-1.0)
rerank_type
str
Type of reranking method to apply
rerank_kwargs
str
Reranking parameters (JSON string)
enable_mmr
bool
Enable Maximal Marginal Relevance reranking
fetch_k
int
Number of candidates to fetch for MMR
lambda_mult
float
MMR diversity parameter (0=max diversity, 1=max relevance, 0.0-1.0)
Pipeline Behavior
augment_context
bool
Whether to augment context from knowledge base
use_model_knowledge
bool
Whether to allow model to use its built-in knowledge
use_cache
bool
Whether to check and use cached responses
chat_history_limit
int
Maximum number of history messages to include
prompt_context_char_threshold
int
Character limit for context in prompts
support_multimodal
bool
Whether the pipeline supports multimodal inputs
Response Formatting
reference_formatter_type
str
Type of reference formatter ("lm", "none")
reference_formatter_threshold
float
Threshold for reference formatting (0.0-1.0)
reference_formatter_batch_size
int
Batch size for reference formatting (≥ 1)
generation_strategy
str
Response generation strategy ("stuff" or "refine")
strategy_batch_size
int
Batch size for generation strategy (≥ 1)
repacker_method
str
Chunk ordering method ("forward", "reverse", "sides")
Advanced Features
enable_personalization
bool
Whether to enable response personalization
enable_guardrails
bool
Whether to enable content guardrails
guardrail_mode
str
Guardrail mode configuration
anonymize_em
bool
Whether to anonymize before using embedding model
anonymize_lm
bool
Whether to anonymize before using language model
Usage Example
prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | None
prompt_builder_catalogs: dict[str, BaseCatalog[Any]] | NoneDictionary mapping catalog identifiers to BaseCatalog instances containing prompt builder configurations.
Structure
Key: Catalog identifier (e.g., "standard_rag", "no_op", scope name)
Value:
BaseCatalog[Any]instance containing prompt templates and configurations
Purpose
Provides access to prompt templates for different use cases
Allows dynamic prompt selection based on pipeline scope or model
Enables prompt customization without code changes
When to Use
When building response synthesizers or generation components
When you need to construct prompts for language model calls
When implementing custom prompt logic
Usage Example
lmrp_catalogs: dict[str, BaseCatalog[Any]] | None
lmrp_catalogs: dict[str, BaseCatalog[Any]] | NoneDictionary mapping catalog identifiers to BaseCatalog instances containing Language Model Request Processor (LMRP) configurations.
Structure
Key: Catalog identifier (e.g., prompt name, use case identifier)
Value:
BaseCatalog[Any]instance containing LMRP configurations
Purpose
Provides access to LMRP configurations for query transformation
Enables query rewriting, expansion, or refinement
Supports different processing strategies for different use cases
When to Use
When building retrieval pipelines that need query transformation
When implementing query rewriting or expansion steps
When you need to process queries before retrieval or generation
Usage Example
build_initial_state() Function Parameters
build_initial_state() Function ParametersFunction Signature
request: dict[str, Any]
request: dict[str, Any]User request data and conversation context. This is the raw input from the API/UI.
Message Content
message
str
User's input message/text query
original_message
str
Original message without any preprocessing
binaries
list
Binary data objects (images, files, etc.)
user_multimodal_contents
list
Multimodal content from the user
Conversation Context
conversation_id
str
Identifier for the conversation thread
user_id
str
Identifier for the user making the request
parent_id
str
ID of the parent message (for threaded conversations)
user_message_id
str
Unique identifier for this user message
assistant_message_id
str
Unique identifier for the assistant's response
chat_history
list[Message]
Previous messages in the conversation
last_message_id
str
ID of the last message in the conversation
Knowledge Base & Retrieval
knowledge_base_id
str
Identifier for the knowledge base to retrieve from
search_type
str
Type of search ("normal", "smart", "hybrid", "search")
connectors
list[str]
List of connector identifiers to use
connector_user_token
str
Authentication token for connectors
filters
list[dict]
Metadata filters for retrieval
Attachments
attachments
dict
User-provided file attachments
attachment_chunk_size
int
Size for chunking attachments
use_docproc
bool
Whether to use document processing orchestrator
model_supported_attachments
dict
Model-specific attachment configurations
Metadata
source
str
Source of the request (e.g., "web", "api", "mobile")
lang_id
str
Language identifier (e.g., "en", "id")
start_time
float
Timestamp when the request was initiated
quote
str
Quoted message text if replying to a specific message
client_type
str
Client type making the request
Request Flags
is_regenerated
bool
Whether the message is regenerated
is_edited
bool
Whether the message is edited
is_retried
bool
Whether the message is retried
Advanced
hyperparameters
dict
Dynamic hyperparameters
personalization_details
dict
User personalization data
field_of_interests
list[str]
User's fields of interest
Usage Example
pipeline_config: dict[str, Any]
pipeline_config: dict[str, Any]Same as in the build() function. Contains pipeline configuration including model settings and pipeline-specific configurations (preset + runtime config).
See the build() function's pipeline_config parameter for detailed field descriptions.
Usage in build_initial_state()
previous_state: dict[str, Any] | None
previous_state: dict[str, Any] | NoneThe state from the previous pipeline stage (typically from preprocessing). This allows your pipeline to access data that was already processed.
Common Fields from Preprocessing
Query Processing
user_query
str
The processed user query
masked_user_query
str
Query with PII masked/anonymized
generation_query
str
Query prepared for generation
standalone_query
str
Standalone query extracted from conversation context
retrieval_query
str
Query prepared for retrieval
History & Context
history
list[Message]
Conversation history as Message objects
augmented_history
list[Message]
History with additional context
transformed_history
list[Message]
History after transformations
Event & State Management
event_emitter
EventEmitter
Event emitter for pipeline events
events
list[dict]
List of events emitted so far
cache_hit
bool
Whether the response came from cache
Anonymization
anonymized_mappings
list[AnonymizerMapping]
PII anonymization mappings
Media
media_mapping
dict
Mapping of media references
chat_history_media_mapping
dict
Media mappings from chat history
Memory
memory_identifier
dict
Identifiers for memory retrieval
memory_results
list[Chunk]
Retrieved memory chunks
Other
steps
list[dict]
Steps taken so far in processing
extra_contents
list[MessageContent]
Additional content to pass to model
response
str
Current response (may be empty initially)
Usage Example
**kwargs: Any
**kwargs: AnyAdditional keyword arguments that may be passed by the pipeline execution framework. These are context-specific and may vary.
Common kwargs Fields
event_emitter
EventEmitter
Event emitter instance (if not in previous_state)
last_message_id
str
ID of the last message in the conversation
organization_id
str
Organization identifier for multi-tenancy
Usage Example
Helper Functions
get_retrieval_params()
get_retrieval_params()Extracts retrieval parameters from pipeline config and request.
get_prompt_builder_by_scope()
get_prompt_builder_by_scope()Gets prompt builder for a specific scope.
get_lmrp_by_scope()
get_lmrp_by_scope()Gets LMRP (Language Model Request Processor) for a specific scope.
get_lm_invoker()
get_lm_invoker()Creates a language model invoker.
get_embeddings()
get_embeddings()Creates an embeddings model.
Quick Reference
Most Commonly Used Fields
From pipeline_config:
model_name- Which model to usemodel_kwargs- Model parametersnormal_search_top_k/smart_search_top_k- Retrieval countaugment_context- Enable/disable retrievalchat_history_limit- History limit
From request:
message- User's queryconversation_id- Conversation IDuser_id- User IDknowledge_base_id- KB to searchsearch_type- Search method
From previous_state:
user_query- Processed queryevent_emitter- For streaminghistory- Chat historycache_hit- Cache status
Last updated