Your First RAG Pipeline

This guide will walk you through setting up a basic RAG pipeline.

Prerequisites

This example specifically requires you to complete all setup steps listed on the Prerequisites and Index Your Data with Vector Data Store page.

You should be familiar with these concepts and components:

View full project code on GitHub

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-datastore

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-datastore

FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-datastore

How to Use this Guide

You can either:

Download or copy the complete guide file(s) to get everything ready instantly by heading to 📂 Complete Guide Files section in the end of this page. You can refer to the guide whenever you need explanation or want to clarify how each part works.
Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.

Both options will work—choose based on whether you prefer speed or learning by doing!

Project Setup

Folder Structure

Start by organizing your files (if you have downloaded the Complete Guide Files, you can proceed to the next step). This is the minimal folder structure you can follow, yet you may adjust to your need.

Don’t worry about creating all these files yourself — we’ll provide the content for each file throughout the tutorial.

Prepare your .env file:

Ensure you have a file named .env in your project directory with the following content:

EMBEDDING_MODEL="text-embedding-3-small"
LANGUAGE_MODEL="openai/gpt-5-nano"
OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

This is an example .env file. You may adjust the variables according to your need.

1) Index Your Data

Download the database

For this guide, we provide a preset SQLite database that is loaded with chunks from imaginary_animals.csv . You can download them here.

4MB

database-imaginary-animals-250826.zip

2) Build Core Components of Your Pipeline

Create the Retriever

The Retriever finds and pulls useful information from your ChromaDB database. Create modules/retriever.py:

import os

from dotenv import load_dotenv
from gllm_datastore.vector_data_store import ChromaVectorDataStore
from gllm_inference.em_invoker.openai_em_invoker import OpenAIEMInvoker
from gllm_retrieval.retriever.vector_retriever import BasicVectorRetriever

load_dotenv()

embedding_model = OpenAIEMInvoker(
    model_name=os.getenv("EMBEDDING_MODEL"),
    api_key=os.getenv("OPENAI_API_KEY"),
)

data_store = ChromaVectorDataStore(
    collection_name="documents",
    client_type="persistent",
    persist_directory="data",
    embedding=embedding_model,
)

retriever = BasicVectorRetriever(data_store)

Key Components Explained:

Environment Loading: Load settings from your .env file
Embedding Model: OpenAIEMInvoker converts text into vector embeddings for similarity search
Data Store: ChromaVectorDataStore connects to your local ChromaDB with persistent storage
Retriever: BasicVectorRetriever performs vector similarity search to find relevant documents

The embedding model used here must match the one used when indexing the data for proper retrieval functionality..

Create the Response Synthesizer

The response synthesizer generates the final answer using the retrieved context and user query.

Create modules/response_synthesizer.py:

import os

from dotenv import load_dotenv
from gllm_generation.response_synthesizer import ResponseSynthesizer
from gllm_inference.builder import build_lm_request_processor
from gllm_generation.repacker import Repacker

load_dotenv()

SYSTEM_PROMPT = """
- Use only the information provided in the context below to answer the user's question.
You may infer simple, logical conclusions based on the context, but do not introduce
new facts or external knowledge.
- If the context does not contain enough information to answer the user's question, respond with:
"Sorry, I don't have enough information to answer that."

Context:
{context}
"""
USER_PROMPT = "Question: {query}"

lm_request_processor = build_lm_request_processor(
    model_id=os.environ["LANGUAGE_MODEL"],
    credentials=os.environ["OPENAI_API_KEY"],
    system_template=SYSTEM_PROMPT,
    user_template=USER_PROMPT,
)

response_synthesizer = ResponseSynthesizer.stuff(
    lm_request_processor=lm_request_processor,
    chunks_repacker=Repacker(mode="chunk"),
)

Key Components:

System Prompt: Instructs the model to use only provided context and handle insufficient information gracefully
User Prompt: Templates the user's question for the model
LM Request Processor: Built using the build_lm_request_processor() helper function for simplified setup
Response Synthesizer: ResponseSynthesizer.static_list combines all given chunks into a single prompt for response generation

3) Build the Pipeline

We'll build the full process in your pipeline.py file using GL SDK's pipeline. Open the file and follow these instructions to create steps and compose them:

Note that in this guide, the states used as input state and output state are only user_query, chunks, context, state_variables, and response. The full state structure is defined in Default State: RAGState. For each step, you can pass state variables as parameters to their corresponding components. For example,Vector Retriever accepts query as parameter, thus you can pass the "query" state and takes value from another state (i.e. "user_query") to be mapped. You may also notice that some parameters are passed through configs (runtime_config_map). These are parameters passed to a component but not forwarded to other steps.

Import the helpers and components

import asyncio
from gllm_pipeline.steps import step
from modules.retriever import retriever
from modules.response_synthesizer import response_synthesizer

Create the Retriever Step

This component step searches for relevant chunks based on the user's query.

retriever_step = step(
    component=retriever,
    input_state_map={"query": "user_query"},
    output_state="chunks",
    runtime_config_map={"top_k": "top_k"},
)

Here, the query input takes its value from the user input (user_query). We also configure top_k to control how many results are retrieved.

Create the Response Synthesizer Step

response_synthesizer_step = step(
    component=response_synthesizer,
    input_state_map={
        "query": "user_query",
        "chunks": "chunks",
    },
    output_state="response",
)

Key Components:

System Prompt: Instructs the model to use only provided context and handle insufficient information gracefully
User Prompt: Templates the user's question for the model
LM Request Processor: Built using the build_lm_request_processor() helper function for simplified setup
Response Synthesizer: ResponseSynthesizer.stuff() combines all context into a single prompt for response generation.

Note:

"chunks": "chunks" is the primary data flow in the state.
"kwargs": "chunks" is for validation compatibility since kwargs should not be empty.

Connect Everything into a Pipeline

Finally, use the pipe operator (|) to chain all steps in order:

e2e_pipeline = retriever_step | response_synthesizer_step

4) Run the Pipeline

When running the pipeline, you may encounter an error like this:

[2025-08-26T14:36:10+0700.550 chromadb.telemetry.product.posthog ERROR] Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given

Don't worry about this, since we do not use this Chroma feature. Your Pipeline should still work.

Configure and invoke the pipeline

Configure the state and config for direct pipeline invocation:

state = {
    "user_query": "Give me nocturnal creatures from the dataset",  # Replace with your actual query
    "event_emitter": None,
}

config = {
    "top_k": 5,
    "debug": True,  # Set to True to look at the pipeline execution flow
}

result = asyncio.run(e2e_pipeline.invoke(state, config))
print(f"Pipeline result: {result['response']}")

Run pipeline.py file

python pipeline.py

Observe output

If you successfully run all the steps, you will see something like this:

Building 'OpenAILMInvoker' with config:
  {'model_name': 'gpt-5-nano'}

[Start 'BasicVectorRetriever'] Processing input:
    - query: 'Give me nocturnal creatures from the dataset'
    - top_k: 5
    - embeddings: POST /v1/embeddings → 200
    - vector search: ES /data/_search → 200
[Finished 'BasicVectorRetriever'] Retrieved 5 chunks
    1) Nightwhisper Owl (score 0.7234)
    2) Shadowpounce Lynx (0.6987)
    3) Glowfin Eel (0.6812)
    4) Moonscale Serpent (0.6745)
    5) Stargazer Bat (0.6698)

[Start 'StuffResponseSynthesizer'] Processing query
    - LM invoke: POST /v1/responses → 200
[Finished 'StuffResponseSynthesizer'] Response:
    Here are nocturnal creatures from the dataset:
      1. Nightwhisper Owl
      2. Shadowpounce Lynx
      3. Glowfin Eel

Pipeline result: Here are nocturnal creatures from the dataset:
1. Nightwhisper Owl
2. Shadowpounce Lynx 
3. Glowfin Eel

Congratulations! You've successfully built your first RAG pipeline.

PreviousWhat's New NextDynamic Step

Last updated 3 months ago

Was this helpful?

hashtagInstallation

hashtagHow to Use this Guide

hashtagProject Setup

hashtag1) Index Your Data

hashtag2) Build Core Components of Your Pipeline

hashtagCreate the Retriever

hashtagCreate the Response Synthesizer

hashtag3) Build the Pipeline

hashtag4) Run the Pipeline