memoryCache

What's a Cache?

Applications often re-run the same expensive operations—database lookups, embeddings generation, fuzzy searches, or API calls. A cache avoids repeating this work by storing results and serving them instantly on the next request. Using a cache improves performance, reduces backend load, and keeps response times predictable, especially under heavy traffic.

Cache rides on top of Data Storearrow-up-right. Once the store has the right capabilities, you get a decorator-based cache that is easy to use, stays readable and short.

chevron-rightPrerequisiteshashtag

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-datastore
circle-exclamation

Quick Start

The cache can be used in two ways: as a decorator on your async functions, or through direct method calls. The decorator style is best when you want automatic memoization with zero boilerplate—your function stays clean, and the cache handles key generation, storage, and retrieval for you. Direct calls give you full control: you can store arbitrary payloads, manage metadata, or perform manual lookups without decorating a function.

Simple cache

This basic caching flow keeps everything inside the data store. Call store.as_cache() (requires the fulltext capability), take the decorator, and wrap your async function. Every cache hit is stored as a chunk, so results persist across process restarts. The next time get_user() is called with the same user_id, the decorator intercepts the call, checks the store, and returns the cached result instantly.

Semantic cache

Here is the example of a semantic cache by utilizing the vector capability of the data store.

In this mode, the cache uses the vector capability under the hood. The EM invoker converts the key (e.g., the question text) into an embedding, stores it via store.vector, and later performs a semantic lookup to find “close enough” keys. This is ideal when user queries vary in wording but should map to the same answer—perfect for LLM-powered Q&A or retrieval-augmented interfaces.

Direct method call

If you prefer explicit control, you can call the cache methods directly:

Direct methods behave exactly like the decorator but without wrapping a function:

  1. store() and retrieve() are async and map directly to the underlying data store handlers.

  2. delete() accepts a single key or a list of keys and uses metadata filters internally.

  3. clear() removes all cache entries from the collection—very useful during integration tests or environment resets.

circle-check

Cache Eviction

Some datastores do not ship with TTL or size-based eviction. The SDK adds a pluggable abstraction so you can run consistent policies regardless of backend limits. EvictionManager hosts the policy, and each policy is coded as an eviction strategy (for example, TTL or LRU). When you pass a manager to as_cache, the cache asks the strategy to enrich metadata (like expiry timestamps) before persisting the chunk.

Use an eviction manager when:

  1. Your backend lacks built-in TTL or you want the same policy across multiple backends.

  2. You need metadata-driven eviction (for example, "delete anything past 500 hits").

  3. You plan to combine eviction with semantic or fuzzy matching and want uniform behavior.

Takeaways

  1. The cache is a thin helper over the data store: all durability, filters, and eviction metadata live in the store.

  2. Start with the simple cache, then add an eviction manager when you need TTL or size policies that a backend cannot offer on its own.

  3. Use the same filters and tooling described in the data store guide to inspect or clean cache entries.

API Reference

For more detailed information about the cache and its correlation with the data store, please refer to the API Reference pagearrow-up-right.

Last updated

Was this helpful?