Vector Data Store

What's a Vector Data Store?

Vector datastores are specialized for storing and searching high-dimensional vector embeddings. They are essential for:

  • Semantic search and similarity matching

  • Recommendation systems based on content similarity

  • Document retrieval and information retrieval

  • AI/ML applications requiring embedding storage

Available Implementations: Supported Vector Data Store

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-datastore

Save and Retrieve Data

Let's walk through practical example for vector datastore type. This example will show you how to get started quickly and demonstrate common patterns you'll use in your own projects.

Metadata Filtering

Instead of relying solely on a string for semantic queries, we can also apply metadata filtering through the retrieval_params parameter in the query() method. For example, in ChromaVectorDataStore, the retrieval parameter can be used as follows:

Use as a Cache

One of the important features of the GL SDK is the ability to use vector datastores as a cache. This is perfect for applications that need to cache expensive operations like API calls, database queries, or AI model inferences. The .as_cache() method transforms any vector datastore into a sophisticated caching system with three different matching strategies.

Quick Start with .as_cache()

The .as_cache() method is your gateway to intelligent caching. It converts a vector datastore into a cache with configurable matching strategies:

Three Types of Cache Matching

The GL SDK provides three sophisticated matching strategies, each perfect for different use cases:

1) Exact Matching ("exact")

Perfect for when you need precise key matching. This is the fastest and most reliable option for caching operations where the input must match exactly.

Best for: API responses, database query results, function outputs where exact input matching is required.

2) Fuzzy Matching ("fuzzy")

Ideal for handling typos, slight variations, or minor differences in input. Uses Levenshtein distance to find close matches.

Best for: User queries, search terms, natural language inputs where minor variations are common.

3) Semantic Matching ("semantic")

The most intelligent option! Uses vector embeddings to find semantically similar content, even when the exact words don't match.

Last updated