Building Blocks

Before you begin building your RAG system using our SDK, let us introduce its building blocks.

Tutorials for the items we describe here are available in the Tutorials section of this documentation. Feel free to come here whenever you get lost!

Pipeline Diagram

This diagram shows the positioning of each components in the system. The components involved in this diagram are described below:

⚙ Guardrail Enforcer

Guardrail

Coming Soon!

⚙ Router

Semantic Router

Decides which processing path to take, given user instruction/question.

Features

Decides which path the query should take through the system since different processing approaches are required for different type of user instruction/question.
Uses language model or semantic analysis to determine the optimal processing path for each query.
Supports route filtering to restrict available routes during processing.
Provides configurable default routes and validation for route selection.

Note: It is suggested to use content-based routing strategy, routing based on query content analysis (utilizes the Aurelio Labs library or embedding).

⚙ Data Ingestion

Data Store

gllm-datastore | Related tutorials: Index Your Data with Vector Data Store Your First RAG Pipeline | API Reference

Place to store knowledge a.k.a. the knowledge base.

Features

Supported Data Store Types:

Traditional SQL DB (see API Reference).
Vector DB: Stores information as mathematical vectors for semantic search (see API Reference).
Graph DB: Stores information as connected networks (see API Reference).

Document Processing Orchestrator

gllm-docproc | Related tutorials: Simple DPO Pipeline (Loader) | API Reference

Orchestrates the processing of the documents from ingestion until data store.

Features

Supported Types
1. Document: .docx, .pdf, .pptx, .xlsx
2. Text: .csv, .html, .java, .js, .jsx, .log, .md, .py, .ts, .tsx, .txt
3. URL: any public URL that's not behind protection (e.g. IP block, anti-bot)
4. Image (to text): .heic, .heif, .jpg, .jpeg, .png, .webp
5. Audio (to text): .flac, .mp3, .ogg, .wav
6. YouTube URL (to text; if not blocked by Google)
Chunking Strategies (based on this article)
1. Structured Chunking
2. Document-Based Chunking
3. Table-Aware Chunking
4. Content-Aware Chunking
5. Recursive Chunking
Data Store
1. Store data into vector database.
2. Store data into graph database.
Miscellaneous
1. Extract basic math equation from PDF.
2. Integration with Datasaur's LLM Labs (including LLM Labs' Dynamic Chunking).
3. Customizable (by extending gllm-docproc library).

Limitations (plan to be supported)

Can NOT process video yet.
Can NOT store data into tabular database yet.
Can NOT support transient processing yet.
1. e.g. just extract data from document, not chunking or storing it into vector database.

Limitations (no plan to be supported)

PDF
1. Can NOT extract advanced math equation.
DOCX
1. Can NOT extract math equation.
URL
1. Can NOT bypass URL behind protection (e.g. IP block, anti-bot).
  1. Might be solved using FirecrawlDownloader (leverages Firecrawl).
2. Can NOT access social media (Facebook, Instagram, X, TikTok).
3. Can NOT get a specific part from HTML (as the combination will be infinite)
  1. Specific projects can still customize the result by extending gllm-docproc library.
Can NOT process executable / package file (e.g. .dmg, .exe, .gz, .tar, .zip)
Can NOT process files with proprietary extensions (e.g. .ai, .psd, .dll)
Can NOT crawl/scrape URL periodically
1. Specific projects should be responsible to manage their own scheduler / cron.

⚙ Retrieval

Query Transformer

gllm-retrieval | Involves LM | Tutorial : Query Transformation | Use Case: Query Transformation | API Reference

Converts natural language into better retrieval queries using Language Model.

Features

Enhances query for searching by rephrasing unclear questions or adding missing context.
Uses language model to improve your query.
Supports various error handling strategy.
Current supported transformation strategy:
1. One-to-one transformation: Creates one optimized query from input query; suitable for Step-Back Prompting or HyDE (Hypothetical Document Embeddings).
2. Many-to-one transformation: Combines multiple queries into one optimal query; suitable for query expansion or fusion.
3. One-to-many transformation: Expand query into multiple queries; suitable for query expansion or query decomposition.
4. Text-to-sql transformation: Convert text to SQL; suitable for database querying related questions/instructions.

Note: Use cases may vary depending on the specific retrieval requirements and data characteristics.

Multimodal Transformer

Coming Soon!

Retrieval Parameter Extractor

gllm-retrieval| Involves LM | Tutorial: Retrieval Parameter Extractor | API Reference

Determines optimal search parameters for retrieval operations given a query.

Features

Uses LLM to analyze queries and extract parameters; suitable for complex, context-aware parameter extraction.
Extracts various retrieval parameters:
1. Query: The search query string.
2. Filters: Metadata filters with operators (eq, neq, gt, gte, lt, lte, in, nin, like).
3. Sorting: Sort conditions with order (asc, desc); suitable for result ordering.
Provides validation mechanisms for extracted parameters.
Supports dynamic parameter adjustment based on query characteristics.

Retriever

gllm-retrieval | Involves EM | Tutorial: Retriever | Use Case: Create the Retriever | API Reference

Searches through the knowledge base to find relevant information.

Features

Searches through the knowledge base.
Finds documents, passages, or data points relevant to your question
Supports multiple retrieval strategies such as:
1. Vector Search: Semantic similarity using embeddings.
2. Entity Relationships: Leverages structured knowledge graphs to find information through entity relationships and graph traversal patterns.
3. SQL Search: Enables natural language to SQL conversion for querying structured databases.

Chunk Processor

gllm-retrieval | Tutorial: Chunk Processor | API Reference

Processes and optimizes retrieved chunks for better context handling.

Features

Supports multiple processing strategies, including:
1. Deduplication: Removes duplicate chunks based on content similarity; suitable for reducing redundancy in retrieved results.
2. Merging: Combines related chunks into larger, more coherent segments; suitable for improving context continuity.
3. Basic processing: Standard chunk processing without special modifications; suitable for simple retrieval scenarios.
Uses similarity-based algorithms for deduplication and merging operations.
Provides configurable similarity thresholds and processing parameters.
Maintains chunk metadata and relationships during processing.

Reranker

gllm-retrieval | Involves EM | Tutorial: Reranker | API Reference

Reorders retrieved results by relevance and importance.

Features

Supports multiple reranking methods, including:
1. Similarity-based reranking: Uses embedding similarity scores; suitable for semantic relevance ranking.
2. Text Embedding Inference (TEI): Uses TEI models for high-performance reranking; suitable for large-scale applications.
3. FlagEmbedding-based reranking: Uses FlagEmbedding models; suitable for multilingual and specialized domains.
4. Cohere Bedrock reranking: Uses AWS Bedrock Cohere service; suitable for cloud-based, managed reranking.
Uses embedding models to calculate relevance scores.
Provides configurable ranking thresholds and parameters.
Supports fallback to original chunks on error.

⚙ Generation

Compressor

gllm-generation | Tutorial: Compressor | API Reference

Reduces context size while preserving essential information.

Features

Supports multiple compression methods, including:
1. LLMLingua compression: Uses LLMLingua models for intelligent compression; suitable for high-quality content reduction.
2. Basic compression: Standard compression without special algorithms; suitable for simple size reduction.
Uses language models to identify and preserve important information.
Provides configurable compression ratios and quality thresholds.
Supports various compression strategies based on content type and requirements.

Context Enricher

gllm-generation | Tutorial: Context Enricher | API Reference

Enhances context with additional metadata and information.

Features

Supports multiple enrichment strategies, including:
1. Basic context enrichment: Adds fundamental metadata to chunks; suitable for simple context enhancement.
2. Metadata-based enrichment: Enhances context with detailed metadata information; suitable for comprehensive context building.
Uses language models to generate contextual information.
Provides configurable enrichment parameters and formatting options.
Supports metadata information formatting and structuring.

Reference Formatter

Formats citations and sources in generated responses.

Features

Supports multiple formatting strategies, including:
1. Language model-based formatting: Uses LLM to generate contextual citations; suitable for natural, integrated references.
2. Similarity-based formatting: Uses embedding similarity for reference matching; suitable for precise source attribution.
3. Basic formatting: Standard reference formatting; suitable for simple citation requirements.
Uses language models or embedding models to enhance reference quality.
Provides configurable citation formats and styles.
Ensures proper attribution of information sources.

Relevance Filter

gllm-generation| Involves EM | Involves LM | Tutorial: Relevance Filter | API Reference

Removes irrelevant information from retrieved context.

Features

Supports multiple filtering methods, including:
1. Semantic similarity filtering: Filters based on vector similarity scores; suitable for embedding-based relevance assessment.
2. Language model-based filtering: Uses LLM to determine chunk relevance; suitable for context-aware filtering with high accuracy.
Uses embedding models or language models to assess relevance.
Provides configurable similarity thresholds and filtering criteria.
Supports batch processing for improved performance.

Repacker

gllm-generation | Tutorial: Repacker | Use Case: Your First RAG Pipeline | API Reference

Packages retrieved chunks into formats optimized for LLM understanding.

Features

Supports multiple packing strategies, including:
1. Forward packing: Maintains original chunk order; suitable for preserving document flow
2. Reverse packing: Reverses chunk order; suitable for prioritizing recent or important information
3. Sides packing: Alternates chunks from end and start; suitable for balanced context presentation
Provides configurable size limits and delimiter options
Supports both chunk-based and context-based packing modes
Includes size measurement functions for optimal packing

Response Synthesizer

gllm-generation | Involves LM | Tutorial: Response Synthesizer | Use Case: Create the Response Synthesizer | API Reference

Generates final responses by combining query, context, and history.

Features

Supports multiple synthesis strategies, including:
1. Stuff synthesis: Combines all context into single prompt; suitable for comprehensive responses.
2. Static list synthesis: Uses predefined response templates; suitable for structured, consistent outputs.
Uses language models to generate coherent and relevant responses.
Supports streaming responses for real-time output.
Provides configurable hyperparameters and system prompts.
Handles multimodal content and attachments.

⚙ Conversation History, Cache, and Memory Manager

Chat History Manager

gllm-misc | Involves LM | Related tutorials: Chat History | API Reference

Manages conversation history for consistent and contextual responses.

Features

Supports multiple history processing methods, including:
1. Similarity-based filtering: Filters message pairs using embedding similarity; suitable for removing redundant conversations.
2. Language model-based processing: Uses LLM to select relevant message pairs; suitable for intelligent history curation.
Uses language models or embedding models to process conversation history.
Provides configurable data retention and deletion policies.
Supports conversation threading and context management.
Handles multiple storage backends.

Cache Manager

gllm-misc | Involves LM | Related tutorials: Caching Implementation | Use Case: Caching| API Reference

Caches frequently accessed information for improved response speed.

Features

Supports multiple cache backends and strategies.
Uses language models to generate cache keys and validate cached content.
Provides configurable TTL and invalidation policies.
Uses Data Store to store cache information.
Supports cache warming and intelligent cache management.

⚙ Inference

Some components may involve language or embedding models—marked with tag Involves LM or Involves EM. These are the key components that enable seamless inference process:

LM Request Processor

gllm-inference | Tutorial: LM Request Processor (LMRP) | Use Case: Utilize Language Model Request Processor | API Reference

Provides unified interface for LLM interactions.

Features

Integrates prompt builder, LM invoker, and output parser into single interface.
Provides unified interface for LLM interactions.
Supports multiple LLM providers and configurations.
Handles request processing and response management.

Catalog

gllm-inference | Tutorial: Catalog | API Reference

Stores and creates LM request processors or prompt builders from external data sources.

Features

Supports multiple data sources, including:
1. Record-based creation: Creates processors from structured records; suitable for predefined configurations.
2. Google Sheets integration: Creates processors from Google Sheets data; suitable for collaborative configurations.
3. CSV file processing: Creates processors from CSV files; suitable for bulk configuration management.
Provides automated processor creation from external data.
Supports dynamic configuration updates.
Enables easy deployment and management of LLM processors.