Query Transformation
This guide will walk you through adding a Query Transformer component to your existing RAG pipeline that automatically rewrites and optimizes user queries for better document retrieval, improving the relevance and accuracy of your search results.
Query transformation enhances your RAG pipeline by intelligently reformulating user queries to improve retrieval performance, helping you find more relevant documents and generate better responses.
Installation
# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastoreHow to Use this Guide
You can either:
Download or copy the complete guide file(s) to get everything ready instantly by heading to 📂 Complete Guide Files section in the end of this page. You can refer to the guide whenever you need explanation or want to clarify how each part works.
Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.
Both options will work—choose based on whether you prefer speed or learning by doing!
Project Setup
Extend Your RAG Pipeline Project
Start with your completed RAG pipeline project from the Your First RAG Pipeline tutorial. We don't need to add any new file for this tutorial. Therefore, the structure should stay as is:
<project-name>/
├── data/
│ ├── <index>/... # preset data index folder
│ ├── chroma.sqlite3 # preset database file
│ ├── imaginary_animals.csv # sample data
├── modules/
│ ├── retriever.py
│ └── response_synthesizer.py
├── .env
├── indexer.py
└── pipeline.py # 👈 Will be updated with query transformer1) Build the Query Transformer Pipeline
Define extended RAG state
Create a custom state that includes the query state:
Create all pipeline steps
Define all steps including the new query transformer step:
Compose the final pipeline
Chain all steps including the query transformer:
This creates a pipeline that first transforms the user query before retrieving relevant documents, leading to better search results.
🧠 The
RAGStateWithQTextends the baseRAGStateto include the transformed query field.
2) Run the Pipeline
Configure and invoke the pipeline
Configure the state and config for direct pipeline invocation:
Observe output
If you successfully run all the steps, you will see something like this:
Extending the Query Transformation System
Multiple Query Transformation Strategies
You can extend the system with different transformation approaches:
Domain-Specific Query Transformers
Create specialized transformers for different content domains:
Custom Query Transformation Logic
You can implement custom transformation logic:
Troubleshooting
Common Issues
Poor query transformations:
Review and refine your system template for the query transformer
Ensure the transformation model (GPT-4o-mini) is appropriate for your use case
Test different system prompts to improve transformation quality
Query transformation taking too long:
Consider using a faster model for query transformation
Implement caching for frequently transformed queries
Set appropriate timeout configurations in your LM request processor
Transformed queries not improving retrieval:
Analyze the transformed queries to ensure they're more specific
Test with different transformation strategies
Consider the quality and indexing of your document corpus
Pipeline state management issues:
Ensure your custom RAGState class properly extends the base RAGState
Verify that all state field names match between pipeline steps
Check that the state_type is properly assigned to your pipeline
Debug Tips
Enable debug mode: Set
debug: truein your request to see detailed logsLog query transformations: Use the log step to see original vs transformed queries
Test transformations in isolation: Test your query transformer component separately
Compare retrieval results: Compare document retrieval with and without query transformation
Monitor transformation quality: Manually review a sample of transformed queries for quality
Congratulations! You've successfully implemented a Query Transformer component in your RAG pipeline. This enhancement improves document retrieval by intelligently rewriting user queries, leading to more relevant search results and better response quality. Your AI system can now understand user intent better and retrieve more appropriate information from your knowledge base.
Last updated