Vector Retriever

What's a Vector Retriever?

Vector Retriever is the most commonly used retriever type for document-based applications. Vector Retriever retrieves documents and information from vector databases using semantic similarity search.

Best For:

Document search and retrieval
Semantic similarity matching
Large-scale text corpora
Unstructured data search

Key Features:

Embedding-based similarity search
Support for multiple vector databases (Chroma, Elasticsearch, Redis)
Metadata filtering and scoring
Configurable similarity thresholds

Use Cases:

Document Q&A systems
Content recommendation engines
Semantic search applications
Knowledge base retrieval

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

You should be familiar with these concepts:

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-retrieval"

# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/"  "gllm-retrieval"

What it does

The Vector Retriever is a component that retrieves relevant documents from a vector database based on semantic similarity to a query. It provides a standardized interface for document retrieval operations in Gen AI applications.

Inputs

Query: A text string representing the search query
Data Store: A vector data store instance (e.g., Chroma, Elasticsearch, Redis)
Top-k: Maximum number of documents to retrieve (optional, defaults to system default)
Retrieval Parameters: Additional parameters for fine-tuning the search (optional)

Outputs

List of Chunks: A list of Chunk objects containing the retrieved documents with their metadata

Save and Retrieve Data

import asyncio
from gllm_datastore.vector_data_store.chroma_vector_data_store import ChromaVectorDataStore
from gllm_retrieval.retriever.vector_retriever import BasicVectorRetriever

# Initialize Chroma data store in-memory
vector_store = ChromaVectorDataStore(
    collection_name="documents", 
    embedding=OpenAIEMInvoker(model_name="text-embedding-3-small")
)

# Initialize the vector retriever
retriever = BasicVectorRetriever(vector_store)
    
# Perform a basic retrieval
query = "What is machine learning?"
results = await retriever.retrieve(query)

Instead of relying solely on a string for semantic queries, we can also apply metadata filtering through the retrieval_params parameter following the retrieval params provided by that specific data store. For example, if we are using ChromaVectorDataStore, the retrieval parameter can be used as follows:

retrieval_params = {
    "filter": {
        "$and": [
            {"type": "document"},
        ]
    },
    "where_document": {"$contains": {"text": "AI"}},
}
query = "What is machine learning?"
results = await retriever.retrieve(query, retrieval_params=retrieval_params)

PreviousRetriever NextSQL Retriever

Last updated 4 months ago

Was this helpful?

hashtagWhat's a Vector Retriever?

hashtagInstallation

hashtagWhat it does

hashtagInputs

hashtagOutputs

hashtagSave and Retrieve Data