Implement Semantic Routing
This guide will walk you through adding semantic routing to your existing RAG pipeline to intelligently route different types of queries to specialized handlers.
Semantic routing allows your pipeline to automatically decide whether a query needs knowledge base retrieval or can be answered with general knowledge, making your application more efficient and providing better responses.
Installation
# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastoreYou can either:
You can refer to the guide whenever you need explanation or want to clarify how each part works.
Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.
Both options will work—choose based on whether you prefer speed or learning by doing!
Project Setup
Extend Your RAG Pipeline Project
Start with your completed RAG pipeline project from the Your First RAG Pipeline tutorial. You'll extend the project structure as follows:
<project-name>/
├── data/
│ ├── <index>/... # preset data index folder
│ ├── chroma.sqlite3 # preset database file
│ ├── imaginary_animals.csv # sample data
├── modules/
│ ├── retriever.py
│ └── response_synthesizer.py
├── .env
├── indexer.py
├── route_examples.json
└── pipeline.py # 👈 Will be updated with semantic routerCreate the route examples file
Create a new file route_examples.json to define your routing categories:
[
{
"name": "knowledge_base",
"utterances": [
"What unique feature makes the Luminafox glow in the dark?",
"How does the Luminafox attract its prey?",
"Which folklore belief is associated with sighting a Luminafox?",
"What adaptation allows the Aquaflare to survive near volcanic isles?",
"What does the Aquaflare feed on in its extreme environment?",
"Give me 3 aquatic animals",
"Name 2 nocturnal creatures mentioned in the dataset",
"List 3 animals that can generate or store electricity",
"Which 2 creatures are known to glow or emit light?"
]
},
{
"name": "general",
"utterances": [
"What is the capital of France?",
"General knowledge question",
"Tell me about history",
"What is the meaning of life?",
"How does photosynthesis work?",
"What are the benefits of exercise?",
"Tell me about space exploration",
"What is machine learning?",
"How do plants grow?",
"What is the population of Tokyo?",
"How do I make a cake?",
"Why is the sky blue?"
]
}
]1) Build Semantic Routing Components
Create the Semantic Router
The semantic router analyzes incoming queries and determines which specialized handler should process them. It uses embedding similarity to match queries against predefined route examples loaded from your JSON file.
Create modules/semantic_router.py:
Key Components:
Encoder: Uses the same embedding model as your retriever for consistency
Score Threshold: 0.3 provides balanced routing sensitivity
Route Examples: Loaded from your JSON configuration file
Default Route: Falls back to general responses when uncertain
Create the General Query Handler
For queries that don't need knowledge base retrieval, we'll create a specialized response synthesizer that can answer general knowledge questions directly.
Create modules/handlers.py:
Key features:
Optimized for general knowledge questions
No retrieval needed - uses the model's built-in knowledge
Faster responses for queries that don't need your specific data
Create the Enhanced Pipeline
Now we'll update your existing RAG pipeline to include semantic routing using a switch step that intelligently decides between knowledge base retrieval and general responses.
Create the general query step
This handles queries that don't need knowledge base retrieval:
Create the switch step
This is the core routing logic that decides which path to take:
How it works:
knowledge_base route: Triggers your full RAG pipeline (retrieval → synthesis)
general route: Uses the general handler for direct responses
default: Falls back to general responses if routing fails
Define the enhanced state and pipeline
Extend the RAG state and create the final pipeline:
This creates a pipeline that:
Analyzes the query with semantic routing
Either retrieves from your knowledge base OR answers directly
Returns the most appropriate response
🧠 The switch step acts as an intelligent dispatcher, seamlessly integrating your existing RAG pipeline with new routing capabilities.
2) Run the Pipeline
Configure and invoke the pipeline
Configure the state and config for direct pipeline invocation:
Test with knowledge base queries
Try these queries with debug: true to see the routing in action:
Knowledge Base Examples:
You should see in the debug logs that these get routed to the knowledge_base handler and trigger your RAG pipeline.
Test with general knowledge queries
Try these general knowledge questions:
General Knowledge Examples:
These should be routed to the general handler and get direct responses without retrieval.
Verify routing decisions
With debug: true, you should see logs showing:
Which route was selected
The similarity scores for each route. Observe how the similarity threshold affects routing decisions.
Which handler was executed
The specialized response format
Understanding the Flow
Here's what happens when a query comes in:
Query Analysis: The semantic router compares the incoming query against route examples from your JSON file using embedding similarity
Route Selection: The route with the highest similarity score (above the threshold) is selected
Switch Execution: The switch step executes the appropriate pipeline branch based on the selected route:
knowledge_base: Triggers your full RAG pipeline (retrieval → synthesis)
general: Uses the general handler for direct model responses
Pipeline Processing: Either your RAG components or general handler processes the query
Response Generation: The appropriate pipeline returns a response optimized for the query type
Troubleshooting
Routes not working as expected:
Check your route examples - they should be representative and diverse
Verify the similarity threshold isn't too high or too low
Add more examples for better classification
All queries going to default route:
Lower the similarity threshold
Add more diverse examples to your route categories
Check that your embedding model is working correctly
Queries always going to general route:
Check your
route_examples.json- ensure knowledge base examples are specific and diverse and the file is correctly importedLower the score threshold in the encoder
Add more knowledge base examples that match your specific use cases
Use debug mode to see similarity scores and understand why routes aren't matching
Check retry configuration (if a APIConnectionError or TimeoutError is raised, it might due to short timeout configuration)
Wrong route selection:
Review and improve your route examples
Consider adding negative examples or adjusting thresholds
Use debug mode to see similarity scores
Congratulations! You've successfully enhanced your RAG pipeline with semantic routing. Your pipeline now intelligently decides between using your knowledge base and providing general responses, making your application more efficient and delivering better user experiences.
Last updated