Implement Semantic Routing

This guide will walk you through adding semantic routing to your existing RAG pipeline to intelligently route different types of queries to specialized handlers.

Semantic routing allows your pipeline to automatically decide whether a query needs knowledge base retrieval or can be answered with general knowledge, making your application more efficient and providing better responses.

This tutorial extends the Your First RAG Pipeline tutorial. Ensure you have followed the instructions to set up your repository.

Prerequisites

This example specifically requires:

  1. Completion of the Your First RAG Pipeline tutorial - this builds directly on top of it

  2. Completion of all setup steps listed on the Prerequisites page

  3. A working OpenAI API key configured in your environment variables

You should be familiar with these concepts and components:

  1. Components in Your First RAG Pipeline - Required foundation

  2. switch step

View full project code on GitHub

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore

You can either:

  1. You can refer to the guide whenever you need explanation or want to clarify how each part works.

  2. Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.

Both options will work—choose based on whether you prefer speed or learning by doing!

Project Setup

1

Extend Your RAG Pipeline Project

Start with your completed RAG pipeline project from the Your First RAG Pipeline tutorial. You'll extend the project structure as follows:

<project-name>/
├── data/
│   ├── <index>/...                     # preset data index folder
│   ├── chroma.sqlite3                  # preset database file
│   ├── imaginary_animals.csv           # sample data
├── modules/
│   ├── retriever.py
│   └── response_synthesizer.py
├── .env
├── indexer.py                    
├── route_examples.json                    
└── pipeline.py            # 👈 Will be updated with semantic router
2

Create the route examples file

Create a new file route_examples.json to define your routing categories:

[
  {
    "name": "knowledge_base",
    "utterances": [
      "What unique feature makes the Luminafox glow in the dark?",
      "How does the Luminafox attract its prey?",
      "Which folklore belief is associated with sighting a Luminafox?",
      "What adaptation allows the Aquaflare to survive near volcanic isles?",
      "What does the Aquaflare feed on in its extreme environment?",
      "Give me 3 aquatic animals",
      "Name 2 nocturnal creatures mentioned in the dataset",
      "List 3 animals that can generate or store electricity",
      "Which 2 creatures are known to glow or emit light?"
    ]
  },
  {
    "name": "general", 
    "utterances": [
      "What is the capital of France?",
      "General knowledge question",
      "Tell me about history",
      "What is the meaning of life?",
      "How does photosynthesis work?",
      "What are the benefits of exercise?",
      "Tell me about space exploration",
      "What is machine learning?",
      "How do plants grow?",
      "What is the population of Tokyo?",
      "How do I make a cake?",
      "Why is the sky blue?"
    ]
  }
]

The route examples define which types of queries should go to your knowledge base vs. general knowledge. Customize these based on your specific use case.


1) Build Semantic Routing Components

Create the Semantic Router

The semantic router analyzes incoming queries and determines which specialized handler should process them. It uses embedding similarity to match queries against predefined route examples loaded from your JSON file.

Create modules/semantic_router.py:

Key Components:

  • Encoder: Uses the same embedding model as your retriever for consistency

  • Score Threshold: 0.3 provides balanced routing sensitivity

  • Route Examples: Loaded from your JSON configuration file

  • Default Route: Falls back to general responses when uncertain

Create the General Query Handler

For queries that don't need knowledge base retrieval, we'll create a specialized response synthesizer that can answer general knowledge questions directly.

Create modules/handlers.py:

Key features:

  • Optimized for general knowledge questions

  • No retrieval needed - uses the model's built-in knowledge

  • Faster responses for queries that don't need your specific data

Create the Enhanced Pipeline

Now we'll update your existing RAG pipeline to include semantic routing using a switch step that intelligently decides between knowledge base retrieval and general responses.

1

Create the general query step

This handles queries that don't need knowledge base retrieval:

2

Create the switch step

This is the core routing logic that decides which path to take:

How it works:

  • knowledge_base route: Triggers your full RAG pipeline (retrieval → synthesis)

  • general route: Uses the general handler for direct responses

  • default: Falls back to general responses if routing fails

3

Define the enhanced state and pipeline

Extend the RAG state and create the final pipeline:

This creates a pipeline that:

  1. Analyzes the query with semantic routing

  2. Either retrieves from your knowledge base OR answers directly

  3. Returns the most appropriate response

🧠 The switch step acts as an intelligent dispatcher, seamlessly integrating your existing RAG pipeline with new routing capabilities.

2) Run the Pipeline

When running the pipeline, you may encounter an error like this:

[2025-08-26T14:36:10+0700.550 chromadb.telemetry.product.posthog ERROR] Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given

Don't worry about this, since we do not use this Chroma feature. Your Pipeline should still work.

1

Configure and invoke the pipeline

Configure the state and config for direct pipeline invocation:

2

Test with knowledge base queries

Try these queries with debug: true to see the routing in action:

Knowledge Base Examples:

You should see in the debug logs that these get routed to the knowledge_base handler and trigger your RAG pipeline.

3

Test with general knowledge queries

Try these general knowledge questions:

General Knowledge Examples:

These should be routed to the general handler and get direct responses without retrieval.

4

Verify routing decisions

With debug: true, you should see logs showing:

  • Which route was selected

  • The similarity scores for each route. Observe how the similarity threshold affects routing decisions.

  • Which handler was executed

  • The specialized response format

Understanding the Flow

Here's what happens when a query comes in:

  1. Query Analysis: The semantic router compares the incoming query against route examples from your JSON file using embedding similarity

  2. Route Selection: The route with the highest similarity score (above the threshold) is selected

  3. Switch Execution: The switch step executes the appropriate pipeline branch based on the selected route:

    • knowledge_base: Triggers your full RAG pipeline (retrieval → synthesis)

    • general: Uses the general handler for direct model responses

  4. Pipeline Processing: Either your RAG components or general handler processes the query

  5. Response Generation: The appropriate pipeline returns a response optimized for the query type

Troubleshooting

  1. Routes not working as expected:

    1. Check your route examples - they should be representative and diverse

    2. Verify the similarity threshold isn't too high or too low

    3. Add more examples for better classification

  2. All queries going to default route:

    1. Lower the similarity threshold

    2. Add more diverse examples to your route categories

    3. Check that your embedding model is working correctly

  3. Queries always going to general route:

    1. Check your route_examples.json - ensure knowledge base examples are specific and diverse and the file is correctly imported

    2. Lower the score threshold in the encoder

    3. Add more knowledge base examples that match your specific use cases

    4. Use debug mode to see similarity scores and understand why routes aren't matching

    5. Check retry configuration (if a APIConnectionError or TimeoutError is raised, it might due to short timeout configuration)

  4. Wrong route selection:

    1. Review and improve your route examples

    2. Consider adding negative examples or adjusting thresholds

    3. Use debug mode to see similarity scores


Congratulations! You've successfully enhanced your RAG pipeline with semantic routing. Your pipeline now intelligently decides between using your knowledge base and providing general responses, making your application more efficient and delivering better user experiences.

Last updated