block-brickBuilding Blocks

Before you begin building your RAG system using our SDK, let us introduce its building blocks.

Tutorials for the items we describe here are available in the Tutorials section of this documentation. Feel free to come here whenever you get lost!

Pipeline Diagram

This diagram shows the positioning of each components in the system. The components involved in this diagram are described below:

⚙ Guardrail Enforcer

Guardrail

circle-exclamation

⚙ Router

Semantic Router

gllm-miscarrow-up-right | Involves LM | Involves EM | Tutorial: Routing| Use Case: Implement Semantic Routing | API Referencearrow-up-right

Decides which processing path to take, given user instruction/question.

chevron-rightFeatureshashtag
  1. Decides which path the query should take through the system since different processing approaches are required for different type of user instruction/question.

  2. Uses language model or semantic analysis to determine the optimal processing path for each query.

  3. Supports route filtering to restrict available routes during processing.

  4. Provides configurable default routes and validation for route selection.

Note: It is suggested to use content-based routing strategy, routing based on query content analysis (utilizes the Aurelio Labs library or embedding).

⚙ Data Ingestion

Data Store

gllm-datastorearrow-up-right | Related tutorials: Index Your Data with Vector Data Store Your First RAG Pipeline | API Referencearrow-up-right

Place to store knowledge a.k.a. the knowledge base.

chevron-rightFeatureshashtag

Supported Data Store Types:

  1. Traditional SQL DB (see API Referencearrow-up-right).

  2. Vector DB: Stores information as mathematical vectors for semantic search (see API Referencearrow-up-right).

  3. Graph DB: Stores information as connected networks (see API Referencearrow-up-right).

Document Processing Orchestrator

gllm-docprocarrow-up-right | Related tutorials: Simple DPO Pipeline (Loader) | API Referencearrow-up-right

Orchestrates the processing of the documents from ingestion until data store.

chevron-rightFeatureshashtag
  1. Supported Types

    1. Document: .docx, .pdf, .pptx, .xlsx

    2. Text: .csv, .html, .java, .js, .jsx, .log, .md, .py, .ts, .tsx, .txt

    3. URL: any public URL that's not behind protection (e.g. IP block, anti-bot)

    4. Image (to text): .heic, .heif, .jpg, .jpeg, .png, .webp

    5. Audio (to text): .flac, .mp3, .ogg, .wav

    6. YouTube URL (to text; if not blocked by Google)

  2. Chunking Strategies (based on this articlearrow-up-right)

    1. Structured Chunking

    2. Document-Based Chunking

    3. Table-Aware Chunking

    4. Content-Aware Chunking

    5. Recursive Chunking

  3. Data Store

    1. Store data into vector database.

    2. Store data into graph database.

  4. Miscellaneous

    1. Extract basic math equation from PDF.

    2. Integration with Datasaur's LLM Labsarrow-up-right (including LLM Labs' Dynamic Chunking).

    3. Customizable (by extending gllm-docprocarrow-up-right library).

Limitations (plan to be supported)

  1. Can NOT process video yet.

  2. Can NOT store data into tabular database yet.

  3. Can NOT support transient processing yet.

    1. e.g. just extract data from document, not chunking or storing it into vector database.

Limitations (no plan to be supported)

  1. PDF

    1. Can NOT extract advanced math equation.

  2. DOCX

    1. Can NOT extract math equation.

  3. URL

    1. Can NOT bypass URL behind protection (e.g. IP block, anti-bot).

    2. Can NOT access social media (Facebook, Instagram, X, TikTok).

    3. Can NOT get a specific part from HTML (as the combination will be infinite)

      1. Specific projects can still customize the result by extending gllm-docprocarrow-up-right library.

  4. Can NOT process executable / package file (e.g. .dmg, .exe, .gz, .tar, .zip)

  5. Can NOT process files with proprietary extensions (e.g. .ai, .psd, .dll)

  6. Can NOT crawl/scrape URL periodically

    1. Specific projects should be responsible to manage their own scheduler / cron.

⚙ Retrieval

Query Transformer

gllm-retrievalarrow-up-right | Involves LM | Tutorial : Query Transformation | Use Case: Query Transformation | API Referencearrow-up-right

Converts natural language into better retrieval queries using Language Model.

chevron-rightFeatureshashtag
  1. Enhances query for searching by rephrasing unclear questions or adding missing context.

  2. Uses language model to improve your query.

  3. Supports various error handling strategy.

  4. Current supported transformation strategy:

    1. One-to-one transformation: Creates one optimized query from input query; suitable for Step-Back Prompting or HyDE (Hypothetical Document Embeddings).

    2. Many-to-one transformation: Combines multiple queries into one optimal query; suitable for query expansion or fusion.

    3. One-to-many transformation: Expand query into multiple queries; suitable for query expansion or query decomposition.

    4. Text-to-sql transformation: Convert text to SQL; suitable for database querying related questions/instructions.

Note: Use cases may vary depending on the specific retrieval requirements and data characteristics.

Multimodal Transformer

circle-exclamation

Retrieval Parameter Extractor

gllm-retrievalarrow-up-right| Involves LM | Tutorial: Retrieval Parameter Extractor | API Referencearrow-up-right

Determines optimal search parameters for retrieval operations given a query.

chevron-rightFeatureshashtag
  1. Uses LLM to analyze queries and extract parameters; suitable for complex, context-aware parameter extraction.

  2. Extracts various retrieval parameters:

    1. Query: The search query string.

    2. Filters: Metadata filters with operators (eq, neq, gt, gte, lt, lte, in, nin, like).

    3. Sorting: Sort conditions with order (asc, desc); suitable for result ordering.

  3. Provides validation mechanisms for extracted parameters.

  4. Supports dynamic parameter adjustment based on query characteristics.

Retriever

gllm-retrievalarrow-up-right | Involves EM | Tutorial: Retriever | Use Case: Create the Retriever | API Referencearrow-up-right

Searches through the knowledge base to find relevant information.

chevron-rightFeatureshashtag
  1. Searches through the knowledge base.

  2. Finds documents, passages, or data points relevant to your question

  3. Supports multiple retrieval strategies such as:

    1. Vector Search: Semantic similarity using embeddings.

    2. Entity Relationships: Leverages structured knowledge graphs to find information through entity relationships and graph traversal patterns.

    3. SQL Search: Enables natural language to SQL conversion for querying structured databases.

Chunk Processor

gllm-retrievalarrow-up-right | Tutorial: Chunk Processor | API Referencearrow-up-right

Processes and optimizes retrieved chunks for better context handling.

chevron-rightFeatureshashtag
  1. Supports multiple processing strategies, including:

    1. Deduplication: Removes duplicate chunks based on content similarity; suitable for reducing redundancy in retrieved results.

    2. Merging: Combines related chunks into larger, more coherent segments; suitable for improving context continuity.

    3. Basic processing: Standard chunk processing without special modifications; suitable for simple retrieval scenarios.

  2. Uses similarity-based algorithms for deduplication and merging operations.

  3. Provides configurable similarity thresholds and processing parameters.

  4. Maintains chunk metadata and relationships during processing.

Reranker

gllm-retrievalarrow-up-right | Involves EM | Related tutorials: Reranking | API Referencearrow-up-right

Reorders retrieved results by relevance and importance.

chevron-rightFeatureshashtag
  1. Supports multiple reranking methods, including:

    1. Similarity-based reranking: Uses embedding similarity scores; suitable for semantic relevance ranking.

    2. Text Embedding Inference (TEI): Uses TEI models for high-performance reranking; suitable for large-scale applications.

    3. FlagEmbedding-based reranking: Uses FlagEmbedding models; suitable for multilingual and specialized domains.

  2. Uses embedding models to calculate relevance scores.

  3. Provides configurable ranking thresholds and parameters.

  4. Supports batch processing for improved performance.

⚙ Generation

Compressor

gllm-generationarrow-up-right | Tutorial: Compressor | API Referencearrow-up-right

Reduces context size while preserving essential information.

chevron-rightFeatureshashtag
  1. Supports multiple compression methods, including:

    1. LLMLingua compression: Uses LLMLingua models for intelligent compression; suitable for high-quality content reduction.

    2. Basic compression: Standard compression without special algorithms; suitable for simple size reduction.

  2. Uses language models to identify and preserve important information.

  3. Provides configurable compression ratios and quality thresholds.

  4. Supports various compression strategies based on content type and requirements.

Context Enricher

gllm-generationarrow-up-right | Tutorial: Context Enricher | API Referencearrow-up-right

Enhances context with additional metadata and information.

chevron-rightFeatureshashtag
  1. Supports multiple enrichment strategies, including:

    1. Basic context enrichment: Adds fundamental metadata to chunks; suitable for simple context enhancement.

    2. Metadata-based enrichment: Enhances context with detailed metadata information; suitable for comprehensive context building.

  2. Uses language models to generate contextual information.

  3. Provides configurable enrichment parameters and formatting options.

  4. Supports metadata information formatting and structuring.

Reference Formatter

gllm-generationarrow-up-right | Involves EM | Involves LM | Tutorial: Reference Formatter | Use Case: Adding Document References | API Referencearrow-up-right

Formats citations and sources in generated responses.

chevron-rightFeatureshashtag
  1. Supports multiple formatting strategies, including:

    1. Language model-based formatting: Uses LLM to generate contextual citations; suitable for natural, integrated references.

    2. Similarity-based formatting: Uses embedding similarity for reference matching; suitable for precise source attribution.

    3. Basic formatting: Standard reference formatting; suitable for simple citation requirements.

  2. Uses language models or embedding models to enhance reference quality.

  3. Provides configurable citation formats and styles.

  4. Ensures proper attribution of information sources.

Relevance Filter

gllm-generationarrow-up-right| Involves EM | Involves LM | Tutorial: Relevance Filter | API Referencearrow-up-right

Removes irrelevant information from retrieved context.

chevron-rightFeatureshashtag
  1. Supports multiple filtering methods, including:

    1. Semantic similarity filtering: Filters based on vector similarity scores; suitable for embedding-based relevance assessment.

    2. Language model-based filtering: Uses LLM to determine chunk relevance; suitable for context-aware filtering with high accuracy.

  2. Uses embedding models or language models to assess relevance.

  3. Provides configurable similarity thresholds and filtering criteria.

  4. Supports batch processing for improved performance.

Repacker

gllm-generationarrow-up-right | Tutorial: Repacker | Use Case: Your First RAG Pipeline | API Referencearrow-up-right

Packages retrieved chunks into formats optimized for LLM understanding.

chevron-rightFeatureshashtag
  1. Supports multiple packing strategies, including:

    1. Forward packing: Maintains original chunk order; suitable for preserving document flow

    2. Reverse packing: Reverses chunk order; suitable for prioritizing recent or important information

    3. Sides packing: Alternates chunks from end and start; suitable for balanced context presentation

  2. Provides configurable size limits and delimiter options

  3. Supports both chunk-based and context-based packing modes

  4. Includes size measurement functions for optimal packing

Response Synthesizer

gllm-generationarrow-up-right | Involves LM | Tutorial: Response Synthesizer | Use Case: Create the Response Synthesizer | API Referencearrow-up-right

Generates final responses by combining query, context, and history.

chevron-rightFeatureshashtag
  1. Supports multiple synthesis strategies, including:

    1. Stuff synthesis: Combines all context into single prompt; suitable for comprehensive responses.

    2. Static list synthesis: Uses predefined response templates; suitable for structured, consistent outputs.

  2. Uses language models to generate coherent and relevant responses.

  3. Supports streaming responses for real-time output.

  4. Provides configurable hyperparameters and system prompts.

  5. Handles multimodal content and attachments.

⚙ Conversation History, Cache, and Memory Manager

Chat History Manager

gllm-miscarrow-up-right | Involves LM | Related tutorials: Chat History | API Referencearrow-up-right

Manages conversation history for consistent and contextual responses.

chevron-rightFeatureshashtag
  1. Supports multiple history processing methods, including:

    1. Similarity-based filtering: Filters message pairs using embedding similarity; suitable for removing redundant conversations.

    2. Language model-based processing: Uses LLM to select relevant message pairs; suitable for intelligent history curation.

  2. Uses language models or embedding models to process conversation history.

  3. Provides configurable data retention and deletion policies.

  4. Supports conversation threading and context management.

  5. Handles multiple storage backends.

Cache Manager

gllm-miscarrow-up-right | Involves LM | Related tutorials: Caching Implementation | Use Case: Caching| API Referencearrow-up-right

Caches frequently accessed information for improved response speed.

chevron-rightFeatureshashtag
  1. Supports multiple cache backends and strategies.

  2. Uses language models to generate cache keys and validate cached content.

  3. Provides configurable TTL and invalidation policies.

  4. Uses Data Store to store cache information.

  5. Supports cache warming and intelligent cache management.


Inference

Some components may involve language or embedding models—marked with tag Involves LM or Involves EM. These are the key components that enable seamless inference process:

LM Request Processor

gllm-inferencearrow-up-right | Tutorial: LM Request Processor (LMRP) | Use Case: Utilize Language Model Request Processor | API Referencearrow-up-right

Provides unified interface for LLM interactions.

chevron-rightFeatureshashtag
  1. Integrates prompt builder, LM invoker, and output parser into single interface.

  2. Provides unified interface for LLM interactions.

  3. Supports multiple LLM providers and configurations.

  4. Handles request processing and response management.

Catalog

gllm-inferencearrow-up-right | Tutorial: Catalog | API Referencearrow-up-right

Stores and creates LM request processors or prompt builders from external data sources.

chevron-rightFeatureshashtag
  1. Supports multiple data sources, including:

    1. Record-based creation: Creates processors from structured records; suitable for predefined configurations.

    2. Google Sheets integration: Creates processors from Google Sheets data; suitable for collaborative configurations.

    3. CSV file processing: Creates processors from CSV files; suitable for bulk configuration management.

  2. Provides automated processor creation from external data.

  3. Supports dynamic configuration updates.

  4. Enables easy deployment and management of LLM processors.

Prompt Builder

gllm-inferencearrow-up-right | Tutorial: Prompt Builder| Use Case: Utilize Language Model Request Processor | API Referencearrow-up-right

Constructs prompts from templates and dynamic content.

chevron-rightFeatureshashtag
  1. Supports variable substitution and conditional logic.

  2. Handles different prompt formats and structures.

  3. Provides template-based prompt construction.

  4. Supports dynamic content integration.

LM Invoker

gllm-inferencearrow-up-right | Tutorial: Language Model (LM) Invoker| Use Case: Utilize Language Model Request Processor | API Referencearrow-up-right

Provides unified interface for interacting with multiple LM providers.

chevron-rightFeatureshashtag
  1. Supports multiple providers: OpenAI, Anthropic, Google, etc.

  2. Handles streaming, batching, and error management.

  3. Provides unified interface for different LLM services.

  4. Supports configurable model parameters and settings.

Output Parser

gllm-inferencearrow-up-right | Tutorial: Output Parser| Use Case: Produce Consistent Output from LM| API Referencearrow-up-right

Extracts structured information from LM responses.

chevron-rightFeatureshashtag
  1. Validates response format and content.

  2. Handles parsing errors gracefully.

  3. Supports structured output extraction.

  4. Provides configurable parsing rules and validation.

EM Invoker

gllm-inferencearrow-up-right | Tutorial: Embedding Model (EM) Invoker | Use Case: Your First RAG Pipeline| API Referencearrow-up-right

Provides unified interface for interacting with multiple EM providers.

chevron-rightFeatureshashtag
  1. Converts text to vector representations.

  2. Supports multiple embedding providers and models.

  3. Provides unified interface for embedding operations.

  4. Handles batch processing and error management.

⚙️ Orchestration

Pipeline

gllm-pipelinearrow-up-right | Tutorial: Pipeline| Use Case: Build End-to-End RAG PipelineExecute a Pipeline| API Referencearrow-up-right

Sequences and manages the execution of the components in our SDK.

Steps

gllm-pipelinearrow-up-right | Tutorial: Steps| Use Case: Build End-to-End RAG Pipeline| API Referencearrow-up-right

The building block of a Pipeline: reads from the state, performs an operation, and writes results back.

Last updated