RAG with Dynamic Models

This guide will walk you through the process of creating a Pipeline where the model is not fixed. This is useful for RAG applications where users can freely select their models.

This tutorial extends the Your First RAG Pipeline tutorial. Ensure you have followed the instructions to set up your repository.

Prerequisites

This example specifically requires:

Completion of all setup steps listed on the Prerequisites page.
An Elastic Search vector data store that is already set up and available for use. Refer to Supported Vector Data Store and Index Your Data with Vector Data Store.

You must have already completed the following tutorial:

Your First RAG Pipeline

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore

# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore

You can either:

You can refer to the guide whenever you need explanation or want to clarify how each part works.
Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.

Both options will work—choose based on whether you prefer speed or learning by doing!

Project Setup

Extend Your RAG Pipeline Project

Start with your completed RAG pipeline project from the previous tutorial. Add the query transformer component to your existing structure:

You'll extend your existing structure with this new file:

<project-name>/
├── data/
│   ├── <index>/...                    
│   ├── chroma.sqlite3                 
│   ├── imaginary_animals.csv         
├── modules/
│   ├── retriever.py                
│   └── response_synthesizer.py     # 👈 Updated with dynamic model option
├── pipeline.py                     # 👈 Updated with dynamic model pipeline
├── indexer.py                      
└── .env

Make sure you already set up the API keys for the models that you'd like to try. Refer to Supported Models for the supported models.

1) Make the response synthesizer dynamic

For the user to be able to utilize different models, we will make the response synthesizer dynamic. To do this, we will wrap the response synthesizer inside a builder function, like so.

response_synthesizer.py

import os

from dotenv import load_dotenv
from gllm_generation.response_synthesizer import StuffResponseSynthesizer
from gllm_inference.builder import build_lm_request_processor

load_dotenv()


def build_response_synthesizer(model_id: str) -> StuffResponseSynthesizer:
    """Build a response synthesizer for the given model.

    Args:
        model_id (str): The model identifier to use for the LM request processor.

    Returns:
        StuffResponseSynthesizer: Synthesizer configured with SYSTEM_PROMPT and USER_PROMPT.
    """
    SYSTEM_PROMPT = """
    - Use only the information provided in the context below to answer the user's question. You may infer simple, logical conclusions based on the context, but do not introduce new facts or external knowledge.
    - You may suggest or summarize the options listed in the context if the question asks for recommendations, ideas, or what to do.
    - If the context does not contain enough information to answer the user's question, respond with:
    "Sorry, I don’t have enough information to answer that."

    Context:
    {context}
    """
    USER_PROMPT = "Question: {query}"

    lm_request_processor = build_lm_request_processor(
        model_id=model_id, 
        credentials=os.environ["OPENAI_API_KEY"],
        system_template=SYSTEM_PROMPT,
        user_template=USER_PROMPT,
    )
    response_synthesizer = StuffResponseSynthesizer(lm_request_processor=lm_request_processor)

    return response_synthesizer

2) Make the pipeline dynamic

Next, we will make the pipeline itself dynamic.

Import the response synthesizer builder

Since it is the response synthesizer that will handle the calls to the different LMs, we change our import from using the prebuilt component to using the build_response_synthesizer function that we created in the previous step.

from modules.response_synthesizer import build_response_synthesizer

Wrap the pipeline inside a builder function

Similarly, we need to make it so that the Pipeline is dynamic by wrapping it inside a builder function, which builds a new pipeline with the selected model.

import asyncio
from gllm_pipeline.steps import step
from gllm_pipeline.pipeline import Pipeline
from modules.response_synthesizer import build_response_synthesizer
from modules.retriever import retriever

def build_pipeline(model_id: str) -> Pipeline:
    """Build the end-to-end pipeline.

    Args:
        model_id (str): Model identifier used to build the response synthesizer.

    Returns:
        Any: A composed pipeline with .invoke(state, config) coroutine method.
    """
    # The following steps stay the same
    retriever_step = step(
        retriever,
        {"query": "user_query"},
        "chunks",
        {"top_k": "top_k"}
    )
    
    response_synthesizer_step = step(
        component=response_synthesizer,
        input_map={
            "query": "user_query",
            "chunks": "chunks",
        },
        output_state="response",
    )
    return retriever_step | response_synthesizer_step

3) Run the pipeline

To run the pipeline, we modify the main block as follows:

if __name__ == "__main__":
    model_id = "openai/gpt-4.1-nano"  # Change this to whatever you want
    e2e_pipeline = build_pipeline(model_id)
    state = {
        "user_query": "Give me nocturnal creature from the dataset",  # Replace with your actual query
    }

    config = {
        "top_k": 5,
        "debug": True,  # Set to True to look at the pipeline execution flow
    }

    result = asyncio.run(e2e_pipeline.invoke(state, config))
    print(f"Response: {result['response']}")

You should get a response similar to this:

Response: Here are three aquatic animals from the provided context:

1. Aquaflare - A marine creature found near the volcanic isles of Pyronia, with heat-resistant scales and the ability to withstand extreme temperatures.

2. Starburst Lionfish - A solitary fish living in the coral reefs of Celestial Sea, known for its luminescent fins and mild toxin.

3. Aquaglow Jelly - A translucent bioluminescent jellyfish that drifts in the depths of Azure Lake, feeding on microscopic organisms.

📂 Complete Guide Files

4MB

dynamic-model-250923.zip

hashtagInstallation

hashtagProject Setup

hashtag1) Make the response synthesizer dynamic

hashtag2) Make the pipeline dynamic

hashtag3) Run the pipeline

hashtag📂 Complete Guide Files

Installation

Project Setup

1) Make the response synthesizer dynamic

2) Make the pipeline dynamic

3) Run the pipeline

📂 Complete Guide Files