Reference Formatter
What’s a Reference Formatter?
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-generation"# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-generation"# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-generation"Quickstart
import asyncio
from gllm_core.schema import Chunk
from gllm_inference.builder import build_em_invoker
from gllm_generation.reference_formatter import SimilarityBasedReferenceFormatter
candidate_chunks = [
Chunk(content="Indonesia is a country in Southeast Asia.", metadata={"file_name": "indonesia.txt"}),
Chunk(content="Malaysia is a country in Southeast Asia.", metadata={"file_name": "malaysia.txt"}),
Chunk(content="Singapore is a country in Southeast Asia.", metadata={"file_name": "singapore.txt"}),
Chunk(content="The capital of Indonesia is Jakarta.", metadata={"file_name": "indonesia.txt"}),
Chunk(content="The capital of Malaysia is Kuala Lumpur.", metadata={"file_name": "malaysia.txt"}),
Chunk(content="The capital of Singapore is Singapore.", metadata={"file_name": "singapore.txt"}),
]
response = "Indonesia is a country in Southeast Asia. The capital of Indonesia is Jakarta."
em_invoker = build_em_invoker(model_id="openai/text-embedding-3-small")
ref_formatter = SimilarityBasedReferenceFormatter(em_invoker, threshold=0.7)
references = asyncio.run(ref_formatter.format_reference(response=response, chunks=candidate_chunks))
print(references)Format Customization
Returning Raw Chunks
Last updated
Was this helpful?