Relevance Filter

gllm-generation| Involves EM | Involves LM | Tutorial: Relevance Filter | API Reference

What’s a Relevance Filter?

The relevance filter is a utility module designed to filter context chunks based on their relevance with the user query. In this tutorial, you'll learn how to use the SimilarityBasedRelevanceFilter .

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

You should be familiar with these concepts:

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-misc"

Quickstart

Let’s jump into a basic example using SimilarityBasedRelevanceFilter. Since it utilizes an embedding model, we can simply pass an EM invoker to build one. We can also set a threshold to control how strictness of the candidate chunks filtering.

import asyncio
from gllm_core.schema import Chunk
from gllm_inference.builder import build_em_invoker
from gllm_generation.relevance_filter import SimilarityBasedRelevanceFilter

candidate_chunks = [
    Chunk(content="Indonesia is a country in Southeast Asia.", metadata={"file_name": "indonesia.txt"}),
    Chunk(content="Malaysia is a country in Southeast Asia.", metadata={"file_name": "malaysia.txt"}),
    Chunk(content="Singapore is a country in Southeast Asia.", metadata={"file_name": "singapore.txt"}),
    Chunk(content="The capital of Indonesia is Jakarta.", metadata={"file_name": "indonesia.txt"}),
    Chunk(content="The capital of Malaysia is Kuala Lumpur.", metadata={"file_name": "malaysia.txt"}),
    Chunk(content="The capital of Singapore is Singapore.", metadata={"file_name": "singapore.txt"}),
]
query = "In what part of Asia is Indonesia located? And what's its capital city?"

em_invoker = build_em_invoker(model_id="openai/text-embedding-3-small")
relevance_filter = SimilarityBasedRelevanceFilter(em_invoker, threshold=0.6)
filtered_chunks = asyncio.run(relevance_filter.filter(chunks=candidate_chunks, query=query))
print(filtered_chunks)

Expected Output

Last updated