rotateQuery Transformation

This guide will walk you through adding a Query Transformer component to your existing RAG pipeline that automatically rewrites and optimizes user queries for better document retrieval, improving the relevance and accuracy of your search results.

Query transformation enhances your RAG pipeline by intelligently reformulating user queries to improve retrieval performance, helping you find more relevant documents and generate better responses.

circle-info

This tutorial extends the Your First RAG Pipeline tutorial. Ensure you have followed the instructions to set up your repository.

chevron-rightPrerequisiteshashtag

This example specifically requires:

  1. Completion of the Your First RAG Pipeline tutorial - this builds directly on top of it

  2. Completion of all setup steps listed on the Prerequisites page

  3. A working OpenAI API key configured in your environment variables

You should be familiar with these concepts and components:

  1. Components in Your First RAG Pipeline- Required foundation

  2. query-transformer

githubView full project code on GitHub

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore

You can either:

  1. You can refer to the guide whenever you need explanation or want to clarify how each part works.

  2. Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.

Both options will work—choose based on whether you prefer speed or learning by doing!

Project Setup

1

Extend Your RAG Pipeline Project

Start with your completed RAG pipeline project from the Your First RAG Pipeline tutorial. We don't need to add any new file for this tutorial. Therefore, the structure should stay as is:

<project-name>/
├── data/
│   ├── <index>/...                     # preset data index folder
│   ├── chroma.sqlite3                  # preset database file
│   ├── imaginary_animals.csv           # sample data
├── modules/
│   ├── retriever.py
│   └── response_synthesizer.py
├── .env
├── indexer.py                    
└── pipeline.py    # 👈 Will be updated with query transformer

1) Build the Query Transformer Pipeline

1

Define extended RAG state

Create a custom state that includes the query state:

2

Create all pipeline steps

Define all steps including the new query transformer step:

3

Compose the final pipeline

Chain all steps including the query transformer:

This creates a pipeline that first transforms the user query before retrieving relevant documents, leading to better search results.

🧠 The RAGStateWithQT extends the base RAGState to include the transformed query field.

2) Run the Pipeline

circle-info

When running the pipeline, you may encounter an error like this:

[2025-08-26T14:36:10+0700.550 chromadb.telemetry.product.posthog ERROR] Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given

Don't worry about this, since we do not use this Chroma feature. Your Pipeline should still work.

1

Configure and invoke the pipeline

Configure the state and config for direct pipeline invocation:

2

Observe output

If you successfully run all the steps, you will see something like this:

Extending the Query Transformation System

Multiple Query Transformation Strategies

You can extend the system with different transformation approaches:

Domain-Specific Query Transformers

Create specialized transformers for different content domains:

Custom Query Transformation Logic

You can implement custom transformation logic:

Troubleshooting

Common Issues

  1. Poor query transformations:

    • Review and refine your system template for the query transformer

    • Ensure the transformation model (GPT-4o-mini) is appropriate for your use case

    • Test different system prompts to improve transformation quality

  2. Query transformation taking too long:

    • Consider using a faster model for query transformation

    • Implement caching for frequently transformed queries

    • Set appropriate timeout configurations in your LM request processor

  3. Transformed queries not improving retrieval:

    • Analyze the transformed queries to ensure they're more specific

    • Test with different transformation strategies

    • Consider the quality and indexing of your document corpus

  4. Pipeline state management issues:

    • Ensure your custom RAGState class properly extends the base RAGState

    • Verify that all state field names match between pipeline steps

    • Check that the state_type is properly assigned to your pipeline

Debug Tips

  1. Enable debug mode: Set debug: true in your request to see detailed logs

  2. Log query transformations: Use the log step to see original vs transformed queries

  3. Test transformations in isolation: Test your query transformer component separately

  4. Compare retrieval results: Compare document retrieval with and without query transformation

  5. Monitor transformation quality: Manually review a sample of transformed queries for quality


Congratulations! You've successfully implemented a Query Transformer component in your RAG pipeline. This enhancement improves document retrieval by intelligently rewriting user queries, leading to more relevant search results and better response quality. Your AI system can now understand user intent better and retrieve more appropriate information from your knowledge base.

Last updated