serverData Store

What's a Data Store?

A Data Store is a flexible, capability-based abstraction for storing and querying text chunks. It acts as a lightweight shell where you plug in only the features you need—fulltext search, vector search, hybrid search (fulltext + vector in one call), or a combination.

Because all backends inherit from the same base class, the public API stays consistent. For example, switching from Chroma to Elasticsearch (or any other backend) means changing only the constructor; your code that interacts with store.fulltext, store.vector, or store.hybrid stays the same.

This design gives you a single entry point — one store, one set of handlers — regardless of how or where your data is persisted.

chevron-rightPrerequisiteshashtag

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-datastore
circle-exclamation

Quick start

from gllm_datastore.data_store.chroma.data_store import ChromaDataStore, ChromaClientType
from gllm_datastore.core.filters import filter as F
from gllm_inference.em_invoker.openai_em_invoker import OpenAIEMInvoker

em_invoker = OpenAIEMInvoker(model_name="text-embedding-3-small")
store = (
    ChromaDataStore(
        collection_name="customer-notes",
        client_type=ChromaClientType.MEMORY,
    )
    .with_fulltext()
    .with_vector(em_invoker=em_invoker)
)

Now store.fulltext and store.vector are ready. Every capability exposes async CRUD helpers, so call them inside an async context:

Capability menu

Fulltext capability

  1. Reads and writes plain text chunks plus metadata.

  2. Supports exact filters through QueryFilter or the helper filter API.

  3. Offers fuzzy search via retrieve_fuzzy.

  4. Needed when you want to turn the data store into a cache (store.as_cache(...) requires fulltext).

Vector capability

  1. Stores embeddings and enables semantic search.

  2. Needs an embedding model invoker (BaseEMInvoker) when you register it.

  3. Lets you mix semantic and metadata filters.

Hybrid capability

  1. Combines fulltext (e.g. BM25) and vector search in a single query with configurable weights.

  2. Configure via a list of SearchConfig (FULLTEXT and/or VECTOR); each VECTOR entry requires an embedding model invoker.

  3. Use store.hybrid.create(), store.hybrid.retrieve(), and store.hybrid.retrieve_by_vector() for unified indexing and retrieval.

Graph capability

  1. Stores graph data (nodes and edges) and enables graph traversal.

  2. Supports graph algorithms and complex relationship queries.

Encryption capability

  1. Provides transparent field-level encryption for chunk content and metadata.

  2. Works seamlessly with fulltext and vector capabilities.

  3. Encrypts data during write operations and decrypts during read operations.

  4. See Encryption for detailed usage and configuration.

Key-Value capability

  1. Provides versioned storage for secrets and sensitive configuration.

  2. Supports version control, soft-deletion, and Check-and-Set (CAS) operations.

  3. Enables atomic updates and prevents concurrent modification conflicts.

Registering capabilities

Each backend inherits from BaseDataStore, so the registration keywords are always the same.

Capability
Register with
Required arguments
Common extras

Fulltext

with_fulltext(**kwargs)

Depends on backend (for Chroma: collection_name, client)

num_candidates for fuzzy search

Vector

with_vector(em_invoker=...)

em_invoker is mandatory

num_candidates, backend specific

Hybrid

with_hybrid(config=...)

config (list of SearchConfig) is mandatory

Backend-specific

Graph

with_graph(**kwargs)

Depends on backend

Encryption

with_encryption(encryptor=...)

encryptor and fields are mandatory

Registration returns the same store, so you can chain calls. When a capability is missing you will get NotRegisteredException when you access store.vector, store.fulltext, or store.hybrid.

Using the store end to end

1. Prepare chunks

Use gllm_core.schema.Chunk. Each chunk must have id, content, and optional metadata.

2. Write data

Call both only when you registered both capabilities. Otherwise skip the missing one.

3. Query data

When the backend supports hybrid capability, register it with with_hybrid(config=...) and use store.hybrid for create and retrieve. Hybrid combines fulltext and vector scores in one call with configurable weights.

Takeaways

  • Register only the capabilities you plan to use.

  • Interact with capabilities through the handler properties (store.fulltext, store.vector, store.hybrid when registered).

  • Backends differ in setup but stay compatible at the capability level.

API Reference

For more information about the data store, please take a look at our API Reference pagearrow-up-right.

Last updated

Was this helpful?