Data Store
What's a Data Store?
A Data Store is a flexible, capability-based abstraction for storing and querying text chunks. It acts as a lightweight shell where you plug in only the features you need—fulltext search, vector search, or both.
Because all backends inherit from the same base class, the public API stays consistent. For example, switching from Chroma to Elasticsearch (or any other backend) means changing only the constructor; your code that interacts with store.fulltext or store.vector stays the same.
This design gives you a single entry point — one store, one set of handlers — regardless of how or where your data is persisted.
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-datastore# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-datastore"# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-datastoreThis new Data Store interface is available in gllm_datastore >=v0.5.32.
For earlier versions, please refer to Vector Data Store (Legacy)
Quick start
from gllm_datastore.data_store.chroma.data_store import ChromaDataStore, ChromaClientType
from gllm_datastore.core.filters import filter as F
from gllm_inference.em_invoker.openai_em_invoker import OpenAIEMInvoker
em_invoker = OpenAIEMInvoker(model_name="text-embedding-3-small")
store = (
ChromaDataStore(
collection_name="customer-notes",
client_type=ChromaClientType.MEMORY,
)
.with_fulltext()
.with_vector(em_invoker=em_invoker)
)Now store.fulltext and store.vector are ready. Every capability exposes async CRUD helpers, so call them inside an async context:
Capability menu
Fulltext capability
Reads and writes plain text chunks plus metadata.
Supports exact filters through
QueryFilteror the helperfilterAPI.Offers fuzzy search via
retrieve_fuzzy.Needed when you want to turn the data store into a cache (
store.as_cache(...)requires fulltext).
Vector capability
Stores embeddings and enables semantic search.
Needs an embedding model invoker (
BaseEMInvoker) when you register it.Lets you mix semantic and metadata filters.
Registering capabilities
Each backend inherits from BaseDataStore, so the registration keywords are always the same.
Fulltext
with_fulltext(**kwargs)
Depends on backend (for Chroma: collection_name, client)
num_candidates for fuzzy search
Vector
with_vector(em_invoker=...)
em_invoker is mandatory
num_candidates, backend specific
Registration returns the same store, so you can chain calls. When a capability is missing you will get NotRegisteredException the moment you access store.vector or store.fulltext.
Using the store end to end
1. Prepare chunks
Use gllm_core.schema.Chunk. Each chunk must have id, content, and optional metadata.
2. Write data
Call both only when you registered both capabilities. Otherwise skip the missing one.
3. Query data
Takeaways
Register only the capabilities you plan to use.
Interact with capabilities through the handler properties (
store.fulltext,store.vector).Backends differ in setup but stay compatible at the capability level.
API Reference
For more information about the data store, please take a look at our API Reference page.
Last updated