Cache

What's a Cache?

Applications often re-run the same expensive operations—database lookups, embeddings generation, fuzzy searches, or API calls. A cache avoids repeating this work by storing results and serving them instantly on the next request. Using a cache improves performance, reduces backend load, and keeps response times predictable, especially under heavy traffic.

Cache rides on top of Data Store. Once the store has the right capabilities, you get a decorator-based cache that is easy to use, stays readable and short.

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-datastore

# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/"  gllm-datastore

This new Cache interface is available in gllm_datastore >=v0.5.32.

For earlier versions, please refer to Vector Data Store (Legacy)

Quick Start

The cache can be used in two ways: as a decorator on your async functions, or through direct method calls. The decorator style is best when you want automatic memoization with zero boilerplate—your function stays clean, and the cache handles key generation, storage, and retrieval for you. Direct calls give you full control: you can store arbitrary payloads, manage metadata, or perform manual lookups without decorating a function.

Simple cache

from gllm_datastore.data_store import ChromaDataStore

store = ChromaDataStore(collection_name="customer-notes").with_fulltext()
cache = store.as_cache()

@cache.cache()
async def get_user(user_id: int) -> User:
    return await repo.fetch(user_id)

This basic caching flow keeps everything inside the data store. Call store.as_cache() (requires the fulltext capability), take the decorator, and wrap your async function. Every cache hit is stored as a chunk, so results persist across process restarts. The next time get_user() is called with the same user_id, the decorator intercepts the call, checks the store, and returns the cached result instantly.

Semantic cache

Here is the example of a semantic cache by utilizing the vector capability of the data store.

from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_datastore.cache import MatchingStrategy
from gllm_datastore.data_store import ChromaDataStore

em_invoker = OpenAIEMInvoker(model_name="text-embedding-3-small")
store = (
    ChromaDataStore(collection_name="cache-store")
    .with_fulltext()
    .with_vector(em_invoker=em_invoker)
)
cache = store.as_cache(matching_strategy=MatchingStrategy.SEMANTIC)

@cache.cache()
async def get_answer(question: str) -> str:
    return await slow_llm_call(question)

In this mode, the cache uses the vector capability under the hood. The EM invoker converts the key (e.g., the question text) into an embedding, stores it via store.vector, and later performs a semantic lookup to find "close enough" keys. This is ideal when user queries vary in wording but should map to the same answer—perfect for LLM-powered Q&A or retrieval-augmented interfaces.

Direct method call

If you prefer explicit control, you can call the cache methods directly:

cache = store.as_cache(matching_strategy=MatchingStrategy.SEMANTIC)

await cache.store("orders:938", payload, metadata={"channel": "retail"})
result = await cache.retrieve("orders:938")
await cache.delete("orders:938")
await cache.clear()

Direct methods behave exactly like the decorator but without wrapping a function:

store() and retrieve() are async and map directly to the underlying data store handlers.
delete() accepts a single key or a list of keys and uses metadata filters internally.
clear() removes all cache entries from the collection—very useful during integration tests or environment resets.

Use the decorator for convenience; use direct calls when you need flexibility.

Cache Eviction

Some datastores do not ship with TTL or size-based eviction. The SDK adds a pluggable abstraction so you can run consistent policies regardless of backend limits. An eviction manager runs the policy loop, and each policy is implemented as an eviction strategy. When you pass a manager to as_cache, the cache asks the strategy to enrich metadata before persisting the chunk.

Use an eviction manager when:

Your backend lacks built-in TTL or you want the same policy across multiple backends.
You need metadata-driven eviction (for example, "delete anything past 500 hits").
You plan to combine eviction with exact, fuzzy, or semantic matching and want uniform behavior.

from gllm_datastore.cache.vector_cache.eviction_strategy.ttl_eviction_strategy import TTLEvictionStrategy
from gllm_datastore.cache.vector_cache.eviction_manager.asyncio_eviction_manager import AsyncIOEvictionManager

ttl_strategy = TTLEvictionStrategy(ttl="10m")
eviction_manager = AsyncIOEvictionManager(
    vector_store=store,
    eviction_strategy=ttl_strategy,
    check_interval=60,
)
cache = store.as_cache(eviction_manager=eviction_manager)
eviction_manager.start()

@cache.cache(eviction_config={"ttl": "10m"})
async def get_fresh_user(user_id: int) -> User:
    return await repo.fetch(user_id)

TTLEvictionStrategy sets expiration metadata on each cache entry, and AsyncIOEvictionManager periodically deletes expired entries from the backing store. If you need a different policy today, you would need to implement a custom BaseEvictionStrategy and pair it with an eviction manager.

Takeaways

The cache is a thin helper over the data store: all durability, filters, and eviction metadata live in the store.
Start with the simple cache, then add an eviction manager when you need TTL or size policies that a backend cannot offer on its own.
Use the same filters and tooling described in the data store guide to inspect or clean cache entries.

Eviction Components

Eviction Strategy

Name

Status

Notes

TTLEvictionStrategy

Available

Built-in strategy for TTL-based expiration.

LRU

Backlog

Planned least recently used eviction strategy.

LFU

Backlog

Planned least frequently used eviction strategy.

Eviction Manager

Name

Status

Notes

AsyncIOEvictionManager

Available

Runs background eviction checks in an asyncio task.

CeleryEvictionManager

Backlog

Planned manager for running eviction through Celery workers.

API Reference

For more detailed information about the cache and its correlation with the data store, please refer to the API Reference page.

PreviousTutorials NextCore

Last updated 11 days ago

Was this helpful?

hashtagWhat's a Cache?

hashtagInstallation

hashtagQuick Start

hashtagSimple cache

hashtagSemantic cache

hashtagDirect method call

hashtagCache Eviction

hashtagTakeaways

hashtagEviction Components

hashtagEviction Strategy

hashtagEviction Manager

hashtagAPI Reference

What's a Cache?

Installation

Quick Start

Simple cache

Semantic cache

Direct method call

Cache Eviction

Takeaways

Eviction Components

Eviction Strategy

Eviction Manager

API Reference