Your First RAG Pipeline
This guide will walk you through setting up a basic RAG pipeline.
Installation
# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-datastore# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-datastoreFOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-datastoreHow to Use this Guide
You can either:
Download or copy the complete guide file(s) to get everything ready instantly by heading to 📂 Complete Guide Files section in the end of this page. You can refer to the guide whenever you need explanation or want to clarify how each part works.
Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.
Both options will work—choose based on whether you prefer speed or learning by doing!
Project Setup
Folder Structure
Start by organizing your files (if you have downloaded the Complete Guide Files, you can proceed to the next step). This is the minimal folder structure you can follow, yet you may adjust to your need.
Prepare your .env file:
Ensure you have a file named .env in your project directory with the following content:
EMBEDDING_MODEL="text-embedding-3-small"
LANGUAGE_MODEL="openai/gpt-5-nano"
OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"1) Index Your Data
Download the database
For this guide, we provide a preset SQLite database that is loaded with chunks from imaginary_animals.csv . You can download them here.
Arrange the files
Arrange them in your project. You can follow the structure in Project Setup section.
You may use another knowledge base file and adjust accordingly. For more information about how to index data to data store, you can visit this page Index Your Data with Vector Data Store.
2) Build Core Components of Your Pipeline
Create the Retriever
The Retriever finds and pulls useful information from your ChromaDB database. Create modules/retriever.py:
Key Components Explained:
Environment Loading: Load settings from your
.envfileEmbedding Model:
OpenAIEMInvokerconverts text into vector embeddings for similarity searchData Store:
ChromaVectorDataStoreconnects to your local ChromaDB with persistent storageRetriever:
BasicVectorRetrieverperforms vector similarity search to find relevant documents
Create the Response Synthesizer
The response synthesizer generates the final answer using the retrieved context and user query.
Create modules/response_synthesizer.py:
Key Components:
System Prompt: Instructs the model to use only provided context and handle insufficient information gracefully
User Prompt: Templates the user's question for the model
LM Request Processor: Built using the
build_lm_request_processor()helper function for simplified setupResponse Synthesizer: ResponseSynthesizer.static_list combines all given chunks into a single prompt for response generation
3) Build the Pipeline
We'll build the full process in your pipeline.py file using GL SDK's pipeline. Open the file and follow these instructions to create steps and compose them:
Import the helpers and components
Create the Retriever Step
This component step searches for relevant chunks based on the user's query.
Here, the query input takes its value from the user input (user_query). We also configure top_k to control how many results are retrieved.
Create the Response Synthesizer Step
Key Components:
System Prompt: Instructs the model to use only provided context and handle insufficient information gracefully
User Prompt: Templates the user's question for the model
LM Request Processor: Built using the
build_lm_request_processor()helper function for simplified setupResponse Synthesizer:
ResponseSynthesizer.stuff()combines all context into a single prompt for response generation.
Note:
"chunks": "chunks"is the primary data flow in the state."kwargs": "chunks"is for validation compatibility since kwargs should not be empty.
Connect Everything into a Pipeline
Finally, use the pipe operator (|) to chain all steps in order:
4) Run the Pipeline
Configure and invoke the pipeline
Configure the state and config for direct pipeline invocation:
Run pipeline.py file
Observe output
If you successfully run all the steps, you will see something like this:
Congratulations! You've successfully built your first RAG pipeline.
Last updated