memoryCache

What's a Cache?

Applications often re-run the same expensive operations—database lookups, embeddings generation, fuzzy searches, or API calls. A cache avoids repeating this work by storing results and serving them instantly on the next request. Using a cache improves performance, reduces backend load, and keeps response times predictable, especially under heavy traffic.

Cache rides on top of Data Storearrow-up-right. Once the store has the right capabilities, you get a decorator-based cache that is easy to use, stays readable and short.

chevron-rightPrerequisiteshashtag

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-datastore
circle-exclamation

Quick Start

The cache can be used in two ways: as a decorator on your async functions, or through direct method calls. The decorator style is best when you want automatic memoization with zero boilerplate—your function stays clean, and the cache handles key generation, storage, and retrieval for you. Direct calls give you full control: you can store arbitrary payloads, manage metadata, or perform manual lookups without decorating a function.

Simple cache

This basic caching flow keeps everything inside the data store. Call store.as_cache() (requires the fulltext capability), take the decorator, and wrap your async function. Every cache hit is stored as a chunk, so results persist across process restarts. The next time get_user() is called with the same user_id, the decorator intercepts the call, checks the store, and returns the cached result instantly.

Semantic cache

Here is the example of a semantic cache by utilizing the vector capability of the data store.

In this mode, the cache uses the vector capability under the hood. The EM invoker converts the key (e.g., the question text) into an embedding, stores it via store.vector, and later performs a semantic lookup to find "close enough" keys. This is ideal when user queries vary in wording but should map to the same answer—perfect for LLM-powered Q&A or retrieval-augmented interfaces.

Direct method call

If you prefer explicit control, you can call the cache methods directly:

Direct methods behave exactly like the decorator but without wrapping a function:

  1. store() and retrieve() are async and map directly to the underlying data store handlers.

  2. delete() accepts a single key or a list of keys and uses metadata filters internally.

  3. clear() removes all cache entries from the collection—very useful during integration tests or environment resets.

circle-check

Cache Eviction

Some datastores do not ship with TTL or size-based eviction. The SDK adds a pluggable abstraction so you can run consistent policies regardless of backend limits. An eviction manager runs the policy loop, and each policy is implemented as an eviction strategy. When you pass a manager to as_cache, the cache asks the strategy to enrich metadata before persisting the chunk.

Use an eviction manager when:

  1. Your backend lacks built-in TTL or you want the same policy across multiple backends.

  2. You need metadata-driven eviction (for example, "delete anything past 500 hits").

  3. You plan to combine eviction with exact, fuzzy, or semantic matching and want uniform behavior.

TTLEvictionStrategy sets expiration metadata on each cache entry, and AsyncIOEvictionManager periodically deletes expired entries from the backing store. If you need a different policy today, you would need to implement a custom BaseEvictionStrategy and pair it with an eviction manager.

Takeaways

  1. The cache is a thin helper over the data store: all durability, filters, and eviction metadata live in the store.

  2. Start with the simple cache, then add an eviction manager when you need TTL or size policies that a backend cannot offer on its own.

  3. Use the same filters and tooling described in the data store guide to inspect or clean cache entries.

Eviction Components

Eviction Strategy

Name
Status
Notes

TTLEvictionStrategy

Available

Built-in strategy for TTL-based expiration.

LRU

Backlog

Planned least recently used eviction strategy.

LFU

Backlog

Planned least frequently used eviction strategy.

Eviction Manager

Name
Status
Notes

AsyncIOEvictionManager

Available

Runs background eviction checks in an asyncio task.

CeleryEvictionManager

Backlog

Planned manager for running eviction through Celery workers.

API Reference

For more detailed information about the cache and its correlation with the data store, please refer to the API Reference pagearrow-up-right.

Last updated

Was this helpful?