Index Your Data with Vector Data Store
This guide will walk you through setting up a data store and index your local data to a data store.
Installation
# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-inference gllm-datastore# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-inference gllm-datastoreFOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-datastoreYou can either:
You can refer to the guide whenever you need explanation or want to clarify how each part works.
Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.
Both options will work—choose based on whether you prefer speed or learning by doing!
Initialize Vector Data Store
First, we need to set up a vector data store. In this example, we will use in-memory Chroma Vector Data Store. To initialize the data store, we need two components: EM Invoker and Vector Data Store.
Option 1: Directly from a Chunk
All data stores support storing data in a structured format using the Chunk schema. Think of chunks as standardized containers for your data - they provide a consistent way to represent information across different storage types, making it easy to switch between datastores or combine them in your application.
After that, we can simply use add_chunks() method provided by the Vector Data Store.
To load the data, you can run the script below:
Option 2: Loading Data from CSV Files
For real-world applications, you'll often need to load data from structured files like CSV. Suppose your project has the following structure:
To load the data, you can run the script below:
Key features of this approach:
Persistent Storage: Uses
client_type="persistent"to save data to diskMetadata Support: Stores additional information (like animal names) in chunk metadata
Batch Loading: Efficiently loads all CSV rows at once
Structured Data: Converts CSV rows into standardized
Chunkobjects
CSV File Format Example:
Querying Data
To query data using semantic search, we utilize query() method. This will return list[Chunk]
When querying data loaded from CSV, you can access both content and metadata:
📂 Complete Guide Files
For the complete code, please visit our GitHub Cookbook Repository.
Last updated