Caching
This guide will walk you through implementing caching in your AI pipelines to eliminate redundant computations and improve performance. We'll explore how pipeline caching can transform expensive, repetitive operations into instant responses.
Caching functionality gives you control over performance optimization in your pipeline, providing flexibility to cache at different levels based on your specific needs. For example, you can implement step-level caching for expensive operations, pipeline-level caching for complete workflows, or combine both for maximum efficiency.
Installation
# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore# you can use a Conda environment
$token = (gcloud auth print-access-token)
pip install --extra-index-url "https://oauth2accesstoken:$token@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO SET TOKEN=%T
pip install --extra-index-url "https://oauth2accesstoken:%TOKEN%@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastoreYou can either:
You can refer to the guide whenever you need explanation or want to clarify how each part works.
Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.
Both options will work—choose based on whether you prefer speed or learning by doing!
Project Setup
Extend Your RAG Pipeline Project
Start with your completed RAG pipeline project from the previous tutorial. The caching functionality works with any pipeline components - we'll demonstrate with your existing RAG pipeline:
Your existing structure is already complete:
<your-project>/
├── data/
│ └── imaginary_animals.csv
├── modules/
│ ├── __init__.py
│ ├── retriever.py
│ └── response_synthesizer.py
├── pipeline.py # 👈 Will be updated with caching
├── indexer.py
└── .env Understanding Pipeline Caching
When you deploy pipelines to production, you quickly discover a common pattern: the same inputs get processed over and over again. Users ask similar questions, run identical analyses, or trigger the same computational workflows repeatedly.
GLLM Pipeline framework provides two levels of caching that work seamlessly together: pipeline-level caching and step-level caching. Pipeline-level caching stores the entire pipeline's output for a given input, while step-level caching stores individual step results within the pipeline execution.
Our caching system uses vector data store as the cache backend, which provides several advantages: semantic similarity matching (so similar inputs can benefit from cached results), scalable storage, and fast retrieval performance.
1) Set Up Your Cache Data Store
Create the cache data store
Before implementing any caching option, you need to set up a cache data store. Add this to your pipeline file:
You could also configure the matching config (exact match/fuzzy match/semantic match) of the cache store by following the guide in Vector Data Store page.
2) Choose Your Caching Strategy
Pipeline caching allows you to optimize performance at different levels based on your specific needs. Each approach can be implemented independently, giving you flexibility to choose the right caching strategy for your use case:
Step-Level Caching: Caches individual step results within pipeline execution
Pipeline-Level Caching: Caches complete pipeline outputs for given inputs
Multi-Level Caching: Combines both approaches for maximum efficiency
You can choose any combination of these options based on your performance requirements and use cases.
Option 1: Pipeline-Level Caching
When to use: Cache complete pipeline results when users frequently run identical workflows with the same inputs.
Enable caching for the entire pipeline
Create your pipeline with caching enabled:
Benefits:
Maximum performance for repeated identical queries
Simple implementation - just add cache_store parameter
Best for production environments with repetitive usage patterns
Option 2: Multi-Level Caching
We could also use step-level caching alongside pipeline caching. If pipeline caching fails, each step with an active cache will check for a cache hit individually.
Enable caching for the step and pipeline level
Create your step and pipeline with caching enabled:
3) Run the Pipeline
Configure the pipeline state for testing
Set up test cases to demonstrate caching behavior:
Run the pipeline first time (cache miss)
This execution will populate both step-level and pipeline-level caches.
Run the same pipeline again (cache hit)
You should see an improvement on the second run.
Troubleshooting
Cache not providing expected speedup:
Verify debug logs show cache hits/misses as expected
Ensure your inputs are similar enough to trigger cache hits
General caching issues:
Verify your cache data store is properly initialized
Check that cache keys are being generated consistently
Monitor cache hit/miss rates to optimize cache configuration
Test cache behavior with various input patterns
Congratulations! You've successfully enhanced your RAG pipeline with multi-level caching functionality. Your pipeline can now eliminate redundant computations and provide dramatic performance improvements for repeated or similar requests. This caching system scales with your application and provides intelligent matching for optimal cache utilization.o
Last updated