Relevance Filter
gllm-generation| Involves EM | Involves LM | Tutorial: Relevance Filter | API Reference
What’s a Relevance Filter?
The relevance filter is a utility module designed to filter context chunks based on their relevance with the user query. In this tutorial, you'll learn how to use the SimilarityBasedRelevanceFilter .
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-misc"# you can use a Conda environment
$token = (gcloud auth print-access-token)
pip install --extra-index-url "https://oauth2accesstoken:$token@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-misc"# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-misc"Quickstart
Let’s jump into a basic example using SimilarityBasedRelevanceFilter. Since it utilizes an embedding model, we can simply pass an EM invoker to build one. We can also set a threshold to control how strictness of the candidate chunks filtering.
import asyncio
from gllm_core.schema import Chunk
from gllm_inference.builder import build_em_invoker
from gllm_generation.relevance_filter import SimilarityBasedRelevanceFilter
candidate_chunks = [
Chunk(content="Indonesia is a country in Southeast Asia.", metadata={"file_name": "indonesia.txt"}),
Chunk(content="Malaysia is a country in Southeast Asia.", metadata={"file_name": "malaysia.txt"}),
Chunk(content="Singapore is a country in Southeast Asia.", metadata={"file_name": "singapore.txt"}),
Chunk(content="The capital of Indonesia is Jakarta.", metadata={"file_name": "indonesia.txt"}),
Chunk(content="The capital of Malaysia is Kuala Lumpur.", metadata={"file_name": "malaysia.txt"}),
Chunk(content="The capital of Singapore is Singapore.", metadata={"file_name": "singapore.txt"}),
]
query = "In what part of Asia is Indonesia located? And what's its capital city?"
em_invoker = build_em_invoker(model_id="openai/text-embedding-3-small")
relevance_filter = SimilarityBasedRelevanceFilter(em_invoker, threshold=0.6)
filtered_chunks = asyncio.run(relevance_filter.filter(chunks=candidate_chunks, query=query))
print(filtered_chunks)Expected Output
Last updated