Implement Semantic Routing

This guide will walk you through setting up semantic routing in your RAG pipeline to intelligently route different types of queries to specialized handlers.

Prerequisites

This tutorial specifically requires:

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc

Set Up Your Project

Prepare your repository

Let’s prepare your workspace step by step.

1

Create a new project folder:

mkdir my-semantic-routing-pipeline
cd my-semantic-routing-pipeline
2

Prepare your .env file:

Create a file named .env in your project directory with the following content:

EMBEDDING_MODEL="text-embedding-3-small"
LANGUAGE_MODEL="gpt-4.1"
OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

This is an example .env file. You may adjust the variables according to your need.

3

Arrange your project structure to include the semantic routing components:

my-semantic-routing-pipeline/
├── modules/
│   ├── __init__.py
│   ├── semantic_router.py          # 👈 New
│   └── handlers.py                 # 👈 New
├── router_pipeline.py              # 👈 New
└── main.py

Build Semantic Routing Components

Now let's build the components that will enable intelligent query routing.

Create the Semantic Router

The semantic router analyzes incoming queries and determines which specialized handler should process them. It uses embedding similarity to match queries against predefined route examples.

1

Load environment settings and dependencies

Create modules/semantic_router.py and start with the basic imports:

import os
from dotenv import load_dotenv
from gllm_misc.router.similarity_based_router import SimilarityBasedRouter
from gllm_inference.em_invoker import OpenAIEMInvoker

load_dotenv()
2

Set up the embedding model for routing

The semantic router needs an embedding model to understand query meanings:

em_invoker_openai = OpenAIEMInvoker(
    api_key=os.environ["OPENAI_API_KEY"],
    model_name="text-embedding-3-small",
)

🧠 We use the same embedding model as in your retriever for consistency.

3

Define route examples

This is the core of semantic routing - you define example queries for each route category:

def semantic_router_component():
    # Define route examples for different categories
    route_examples = {
        "code_generation": [
            "Write a Python script that reads a CSV file, filters rows where the 'status' column is 'active', and saves the result to a new CSV",
            "Generate a Java function that takes a list of integers and returns a new list containing only the prime numbers",
            "Create a SQL query to join two tables: orders and customers, returning the customer name, order date, and total amount for orders placed in the last 30 days",
            "Generate a Dockerfile for a Flask application running on Python 3.11, exposing port 5000",
            "Write a Python code to sort a dataframe based on the 'date' and 'value' columns",
            "Write a Python code to calculate the average of a list of numbers",
            "Write a Python code to calculate the median of a list of numbers",
            "Write a Python code to calculate the mode of a list of numbers",
            "Write a Python code to calculate the standard deviation of a list of numbers",
            "Write a Python code to calculate the variance of a list of numbers",
            "Write a Python code to calculate the correlation between two lists of numbers",
        ],
        "general": [
            "What is the capital of France?",
            "General knowledge question",
            "Tell me about history",
            "What is the meaning of life?",
            "How does photosynthesis work?",
            "What are the benefits of exercise?",
            "Tell me about space exploration",
            "What is machine learning?",
            "How do plants grow?",
            "What is the population of Tokyo?"
        ]
    }

How it works:

  • The router compares incoming queries against these examples using embedding similarity

  • More examples = better routing accuracy

  • Examples should be diverse and representative of each category

4

Create the similarity-based router

Finally, instantiate the router with your configuration:

    similarity_router = SimilarityBasedRouter(
        em_invoker=em_invoker_openai,
        route_examples=route_examples,
        default_route="general",
        similarity_threshold=0.6
    )

    return similarity_router

Parameters explained:

  • em_invoker: The embedding model for calculating similarities

  • route_examples: Your predefined examples for each route

  • default_route: Fallback route when no good match is found

  • similarity_threshold: Minimum similarity score to match a route (0.6 = 60% similarity)

Create Specialized Handlers

Different types of queries need different handling approaches. Let's create specialized response synthesizers for each route type.

1

Create the handlers file

Create modules/handlers.py with the necessary imports:

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_pipeline.steps import step
from gllm_generation.response_synthesizer import StuffResponseSynthesizer
from gllm_inference.request_processor import LMRequestProcessor
from gllm_inference.prompt_builder import PromptBuilder
2

Create the code generation handler

This handler is optimized for generating code responses:

def code_generation_handler() -> StuffResponseSynthesizer:
    """Create a step that handles technical queries."""

    lm_invoker = OpenAILMInvoker(model_name="gpt-4.1")

    system_template = """You are a helpful assistant that provides code based on the user's query.
        Be concise and to the point. Answer with only the code, no other text."""

    prompt_builder = PromptBuilder(
        system_template=system_template,
        user_template="User's query: {query}",
    )

    return StuffResponseSynthesizer(
        LMRequestProcessor(
            lm_invoker=lm_invoker,
            prompt_builder=prompt_builder,
        ),
    )

Key features:

  • Uses a specialized system prompt for code generation

  • Configured to return concise, code-focused responses

  • No retrieval needed - pure generation

3

Create the general query handler

This handler is optimized for general knowledge questions:

def general_query_handler() -> StuffResponseSynthesizer:
    """Create a step that handles general queries."""

    lm_invoker = OpenAILMInvoker(model_name="gpt-4.1")

    system_template = """You are a helpful assistant that provides accurate and informative answers to general knowledge questions.
        Be concise but thorough in your responses."""

    prompt_builder = PromptBuilder(
        system_template=system_template,
        user_template="Question: {query}",
    )

    return StuffResponseSynthesizer(
        LMRequestProcessor(
            lm_invoker=lm_invoker,
            prompt_builder=prompt_builder,
        )
    )

Key differences:

  • Different system prompt optimized for general knowledge

  • Encourages thorough but concise responses

  • Could be extended to use different models or retrieval strategies

Build the Pipeline

Now we'll create a new pipeline that combines semantic routing with conditional execution.

Understanding Conditional Steps

A ConditionalStep allows your pipeline to make decisions about which path to take based on runtime conditions. Here's how it works:

  1. Router Step: Analyzes the query and determines the route

  2. Conditional Step: Uses the route to decide which handler to execute

  3. Handler Execution: Runs the appropriate specialized handler

Create the Routing Pipeline

1

Create the routing pipeline file

Create router_pipeline.py with the necessary imports:

from gllm_pipeline.pipeline.states import RAGState
from gllm_pipeline.steps import step
from gllm_pipeline.steps.conditional_step import ConditionalStep

from modules import (
    code_generation_handler,
    general_query_handler,
    semantic_router_component,
)
2

Define custom state

Extend the default RAGState to include routing information:

class RouterState(RAGState):
    route: str

This adds a route field to track which route was selected.

3

Create component instances

Instantiate all the components you'll need:

semantic_router_component = semantic_router_component()
code_generation_component = code_generation_handler()
general_query_component = general_query_handler()
4

Create individual handler steps

Define steps for each specialized handler:

code_generation_step = step(
    code_generation_component,
    {"query": "user_query"},
    "response",
)

general_query_step = step(
    general_query_component,
    {"query": "user_query"},
    "response",
)

Both steps take the user query and output a response, but use different handlers.

5

Create the conditional step

This is where the magic happens - the conditional step chooses which handler to execute:

conditional_step = ConditionalStep(
    name="conditional_step",
    branches={"code_generation": code_generation_step, "general": general_query_step},
    condition=semantic_router_component,
    input_state_map={"source": "user_query"},
    output_state="response",
)

Parameters explained:

  • branches: Maps route names to their corresponding steps

  • condition: The router component that determines which branch to take

  • input_state_map: Passes the user query to the router for decision making

  • output_state: Where to store the final response

6

Compose the final pipeline

Connect the router and conditional steps:

e2e_pipeline_with_semantic_router = Pipeline([conditional_step], state_type = RouterState)

This creates a pipeline that:

  1. Routes the query to determine the appropriate handler

  2. Conditionally executes the selected handler

  3. Returns the specialized response

Run the Application

Now let's test the semantic routing functionality with different types of queries.

1

Create the API in main.py

Simply download and place this script in the main.py file

2

Start your server

Run your FastAPI server as before:

poetry run uvicorn main:app --reload

You should see something like:

INFO:     Uvicorn running on http://127.0.0.1:8000
3

Test Your RAG Pipeline via API

To test your app, download and run this run.py file

4

Test with code generation queries

Modify the prompts in run.py file. Try these queries with debug: true to see the routing in action:

Code Generation Examples:

{
  "user_query": "Write a Python function to calculate factorial",
  "debug": true
}
{
  "user_query": "Create a SQL query to find all users who logged in today",
  "debug": true
}

You should see in the debug logs that these get routed to the code_generation handler.

5

Test with general knowledge queries

Try these general knowledge questions:

General Knowledge Examples:

{
  "user_query": "What is the capital of Japan?",
  "debug": true
}
{
  "user_query": "How does photosynthesis work?",
  "debug": true
}

These should be routed to the general handler.

6

Verify routing decisions

With debug: true, you should see logs showing:

  • Which route was selected

  • The similarity scores for each route. Observe how the similarity threshold affects routing decisions.

  • Which handler was executed

  • The specialized response format

Example debug output:

Starting pipeline
[Start 'SimilarityBasedRouter'] Routing input source: 'Generate python code calculate the average of a list of numbers'
[Finished 'SimilarityBasedRouter'] Successfully selected route: 'code_generation'
[Start 'StuffResponseSynthesizer'] Processing query: 'Generate python code calculate the average of a list of numbers'

Understanding the Flow

Here's what happens when a query comes in:

  1. Query Analysis: The semantic router compares the incoming query against all route examples using embedding similarity

  2. Route Selection: The route with the highest similarity score (above the threshold) is selected

  3. Conditional Execution: The ConditionalStep executes the appropriate handler based on the selected route

  4. Specialized Processing: The specialized handler processes the query with its optimized prompt and model configuration

  5. Response Generation: The handler returns a response tailored to the query type

Troubleshooting

  1. Routes not working as expected:

    1. Check your route examples - they should be representative and diverse

    2. Verify the similarity threshold isn't too high or too low

    3. Add more examples for better classification

  2. All queries going to default route:

    1. Lower the similarity threshold

    2. Add more diverse examples to your route categories

    3. Check that your embedding model is working correctly

  3. Wrong route selection:

    1. Review and improve your route examples

    2. Consider adding negative examples or adjusting thresholds

    3. Use debug mode to see similarity scores

📂 Complete Tutorial Files

Coming soon!


Congratulations! You've successfully implemented semantic routing in your RAG pipeline. This intelligent routing system will help you deliver more relevant and specialized responses based on the type of query your users submit.

Last updated