Before you begin building your RAG system using our SDK, let us introduce the components involved as the building blocks—think of it as lego which you can use to craft your system. Feel free to come here whenever you get lost!
Pipeline Diagram
This diagram shows the positioning of each components in the system. The components involved in this diagram are described below:
⚙ Guardrail Enforcer
Guardrail
⚙ Pipeline Router
Pipeline Router
gllm-misc
| Involves LM | Related tutorials : coming soon | API Reference
Component that decides which processing approaches for each type of user instruction/question.
FeaturesDecides which path the query should take through the system since different processing approaches are required for different type of user instruction/question.
Uses language model or semantic analysis to determine the optimal processing path for each query.
Supports route filtering to restrict available routes during processing.
Provides configurable default routes and validation for route selection.
Note:
It is suggested to use content-based routing strategy, routing based on query content analysis (utilizes the Aurelio Labs library or embedding).
⚙ Data Ingestion
Data Store
gllm-datastore
| Related tutorials: Index Your Data | API Reference
Place to store knowledge a.k.a. the knowledge base.
FeaturesSupported Data Store Types:
Vector DB: Stores information as mathematical vectors for semantic search (see API Reference ).
Graph DB: Stores information as connected networks (see API Reference ).
Document Processing Orchestrator
gllm-docproc
| Related tutorials: Simple DPO Pipeline (Loader) | API Reference
Component that orchestrate the processing of the documents from ingestion until data store.
FeaturesSupported Types
Document: .docx, .pdf, .pptx, .xlsx
Text: .csv, .html, .java, .js, .jsx, .log, .md, .py, .ts, .tsx, .txt
URL: any public URL that's not behind protection (e.g. IP block, anti-bot)
Image (to text ): .heic, .heif, .jpg, .jpeg, .png, .webp
Audio (to text ): .flac, .mp3, .ogg, .wav
YouTube URL (to text; if not blocked by Google)
Chunking Strategies (based on this article )
Data Store
Store data into vector database.
Store data into graph database.
Miscellaneous
Extract basic math equation from PDF.
Limitations (plan to be supported)
Can NOT process video yet.
Can NOT store data into tabular database yet.
Can NOT support transient processing yet.
e.g. just chunking a document, not storing it into vector database.
Limitations (no plan to be supported)
PDF
Can NOT extract advanced math equation.
DOCX
Can NOT extract math equation.
URL
Can NOT bypass URL behind protection (e.g. IP block, anti-bot).
Can NOT access social media (Facebook, Instagram, X, TikTok).
Can NOT get a specific part from HTML (as the combination will be infinite )
Specific projects can still customize the result by extending gllm-docproc library.
Can NOT process executable / package file (e.g. .dmg, .exe, .gz, .tar, .zip)
Can NOT process files with proprietary extensions (e.g. .ai, .psd, .dll)
Can NOT crawl/scrape URL periodically
Specific projects should be responsible to manage their own scheduler / cron.
⚙ Retrieval
gllm-retrieval
| Involves LM | Related tutorials : Query Transformation | API Reference
Component that converts natural language into better search queries using Language Model.
FeaturesEnhances query for searching by rephrasing unclear questions or adding missing context.
Uses language model to improve your query.
Supports various error handling strategy.
Current supported transformation strategy:
One-to-one transformation:
Creates one optimized query from input query; suitable for Step-Back Prompting or HyDE (Hypothetical Document Embeddings).
Many-to-one transformation :
Combines multiple queries into one optimal query; suitable for query expansion or fusion.
One-to-many transformation :
Expand query into multiple queries; suitable for query expansion or query decomposition.
Text-to-sql transformation :
Convert text to SQL; suitable for database querying related questions/instructions.
Note: Use cases may vary depending on the specific retrieval requirements and data characteristics.
Retrieval Parameter Extractor
gllm-retrieval
| Involves LM | Related tutorials : Parameter Extraction | API Reference
Component that determines optimal search parameters for retrieval operations.
FeaturesUses LLM to analyze queries and extract parameters; suitable for complex, context-aware parameter extraction.
Extracts various retrieval parameters:
Query : The search query string.
Filters : Metadata filters with operators (eq, neq, gt, gte, lt, lte, in, nin, like).
Sorting : Sort conditions with order (asc, desc); suitable for result ordering.
Provides validation mechanisms for extracted parameters.
Supports dynamic parameter adjustment based on query characteristics.
Retriever
gllm-retrieval
| Involves EM | Related tutorials : Create the Retriever | API Reference
Component that searches through the knowledge base to find relevant information.
FeaturesSearches through the knowledge base.
Finds documents, passages, or data points relevant to your question
Supports multiple retrieval strategies such as:
Vector Search : Semantic similarity using embeddings.
Entity Relationships : Leverages structured knowledge graphs to find information through entity relationships and graph traversal patterns.
SQL Search : Enables natural language to SQL conversion for querying structured databases.
Chunk Processor
gllm-retrieval
| Related tutorials : Chunk Processing | API Reference
Component that processes and optimizes retrieved chunks for better context handling.
FeaturesSupports multiple processing strategies, including:
Deduplication : Removes duplicate chunks based on content similarity; suitable for reducing redundancy in retrieved results.
Merging : Combines related chunks into larger, more coherent segments; suitable for improving context continuity.
Basic processing : Standard chunk processing without special modifications; suitable for simple retrieval scenarios.
Uses similarity-based algorithms for deduplication and merging operations.
Provides configurable similarity thresholds and processing parameters.
Maintains chunk metadata and relationships during processing.
Reranker
gllm-retrieval
| Involves EM | Related tutorials : Reranking | API Reference
Component that reorders retrieved results by relevance and importance.
FeaturesSupports multiple reranking methods, including:
Similarity-based reranking : Uses embedding similarity scores; suitable for semantic relevance ranking.
Text Embedding Inference (TEI) : Uses TEI models for high-performance reranking; suitable for large-scale applications.
FlagEmbedding-based reranking : Uses FlagEmbedding models; suitable for multilingual and specialized domains.
Uses embedding models to calculate relevance scores.
Provides configurable ranking thresholds and parameters.
Supports batch processing for improved performance.
⚙ Context Manipulation
Relevance Filter
gllm-misc
| Involves EM | Involves LM | Related tutorials : Relevance Filtering
Component that removes irrelevant information from retrieved context.
FeaturesSupports multiple filtering methods, including:
Semantic similarity filtering : Filters based on vector similarity scores; suitable for embedding-based relevance assessment.
Language model-based filtering : Uses LLM to determine chunk relevance; suitable for context-aware filtering with high accuracy.
Uses embedding models or language models to assess relevance.
Provides configurable similarity thresholds and filtering criteria.
Supports batch processing for improved performance.
Context Enricher
gllm-misc
| Involves LM | Related tutorials : Context Enrichment
Component that enhances context with additional metadata and information.
FeaturesSupports multiple enrichment strategies, including:
Basic context enrichment : Adds fundamental metadata to chunks; suitable for simple context enhancement.
Metadata-based enrichment : Enhances context with detailed metadata information; suitable for comprehensive context building.
Uses language models to generate contextual information.
Provides configurable enrichment parameters and formatting options.
Supports metadata information formatting and structuring.
Repacker
gllm-misc
| Related tutorials : Create the Repacker | API Reference
Component that packages retrieved chunks into formats optimized for LLM understanding.
FeaturesSupports multiple packing strategies, including:
Forward packing : Maintains original chunk order; suitable for preserving document flow
Reverse packing : Reverses chunk order; suitable for prioritizing recent or important information
Sides packing : Alternates chunks from end and start; suitable for balanced context presentation
Provides configurable size limits and delimiter options
Supports both chunk-based and context-based packing modes
Includes size measurement functions for optimal packing
Compressor
gllm-misc
| Involves LM | Related tutorials : Context Compression | API Reference
Component that reduces context size while preserving essential information.
FeaturesSupports multiple compression methods, including:
LLMLingua compression : Uses LLMLingua models for intelligent compression; suitable for high-quality content reduction.
Basic compression : Standard compression without special algorithms; suitable for simple size reduction.
Uses language models to identify and preserve important information.
Provides configurable compression ratios and quality thresholds.
Supports various compression strategies based on content type and requirements.
⚙ Generation
Response Synthesizer
gllm-generation
| Involves LM | Related tutorials : Your First RAG Pipeline | API Reference
Component that generates final responses by combining query, context, and history.
FeaturesSupports multiple synthesis strategies, including:
Stuff synthesis : Combines all context into single prompt; suitable for comprehensive responses.
Static list synthesis : Uses predefined response templates; suitable for structured, consistent outputs.
Uses language models to generate coherent and relevant responses.
Supports streaming responses for real-time output.
Provides configurable hyperparameters and system prompts.
Handles multimodal content and attachments.
gllm-generation
| Involves EM | Involves LM | Related tutorials : Reference Formatting | API Reference
Component that formats citations and sources in generated responses.
FeaturesSupports multiple formatting strategies, including:
Language model-based formatting : Uses LLM to generate contextual citations; suitable for natural, integrated references.
Similarity-based formatting : Uses embedding similarity for reference matching; suitable for precise source attribution.
Basic formatting : Standard reference formatting; suitable for simple citation requirements.
Uses language models or embedding models to enhance reference quality.
Provides configurable citation formats and styles.
Ensures proper attribution of information sources.
⚙ Conversation History, Cache, and Memory Manager
Chat History Manager
gllm-misc
| Involves LM | Related tutorials : Chat History | API Reference
Component that manages conversation history for consistent and contextual responses.
FeaturesSupports multiple history processing methods, including:
Similarity-based filtering : Filters message pairs using embedding similarity; suitable for removing redundant conversations.
Language model-based processing : Uses LLM to select relevant message pairs; suitable for intelligent history curation.
Uses language models or embedding models to process conversation history.
Provides configurable data retention and deletion policies.
Supports conversation threading and context management.
Handles multiple storage backends.
Cache Manager
gllm-misc
| Involves LM | Related tutorials : Caching Implementation | API Reference
Component that caches frequently accessed information for improved response speed.
FeaturesSupports multiple cache backends and strategies.
Uses language models to generate cache keys and validate cached content.
Provides configurable TTL and invalidation policies.
Uses Data Store to store cache information.
Supports cache warming and intelligent cache management.
⚙ Inference
Some components may involve language or embedding models—marked with tag Involves LM or Involves EM . These are the key components that enable seamless inference process:
LM Request Processor
API Reference | Related tutorials: Part 3 ofCreate the Response Synthesizer
Component that provides unified interface for LLM interactions.
FeaturesIntegrates prompt builder, LM invoker, and output parser into single interface.
Provides unified interface for LLM interactions.
Supports multiple LLM providers and configurations.
Handles request processing and response management.
Catalog
API Reference | Related tutorials: Catalog
Component that creates LM request processors or prompt builders from external data sources.
FeaturesSupports multiple data sources, including:
Record-based creation : Creates processors from structured records; suitable for predefined configurations.
Google Sheets integration : Creates processors from Google Sheets data; suitable for collaborative configurations.
CSV file processing : Creates processors from CSV files; suitable for bulk configuration management.
Provides automated processor creation from external data.
Supports dynamic configuration updates.
Enables easy deployment and management of LLM processors.
Prompt Builder
API Reference | Related tutorials: Prompt Building
Component that constructs prompts from templates and dynamic content.
FeaturesSupports variable substitution and conditional logic.
Handles different prompt formats and structures.
Provides template-based prompt construction.
Supports dynamic content integration.
LM Invoker
API Reference | Related tutorials: Quickstart with LM Invoker
Component that provides interface for interacting with multiple LLM providers.
FeaturesSupports multiple providers: OpenAI, Anthropic, Google, etc.
Handles streaming, batching, and error management.
Provides unified interface for different LLM services.
Supports configurable model parameters and settings.
Output Parser
API Reference | Related tutorials: Structured Output
Component that extracts structured information from LLM responses.
FeaturesValidates response format and content.
Handles parsing errors gracefully.
Supports structured output extraction.
Provides configurable parsing rules and validation.
EM Invoker
API Reference | Related tutorials: Index Your Data
Component that handles embedding model interactions.
FeaturesConverts text to vector representations.
Supports multiple embedding providers and models.
Provides unified interface for embedding operations.
Handles batch processing and error management.