ClassicalRetrievalEvaluator

Use when: You want to evaluate retrieval performance with classical IR metrics (MAP, NDCG, Precision, Recall, Top-K Accuracy).

Fields:

  1. retrieved_chunks (dict[str, float]) — The dictionary of retrieved documents/chunks containing the chunk id and its score.

  2. ground_truth_chunk_ids (list[str]) — The list of reference chunk ids marking relevant chunks.

Example Usage

import asyncio

from gllm_evals.evaluator.classical_retrieval_evaluator import ClassicalRetrievalEvaluator
from gllm_evals.types import RetrievalData


async def main():
    """Main function."""
    data = RetrievalData(
        retrieved_chunks={
            "chunk1": 9.0,
            "chunk2": 0.0,
            "chunk3": 0.3,
            "chunk4": 0.1,
            "chunk5": 0.2,
            "chunk6": 0.4,
            "chunk7": 0.5,
            "chunk8": 0.6,
            "chunk9": 0.7,
            "chunk10": 0.8,
        },
        ground_truth_chunk_ids=["chunk9", "chunk3", "chunk2"],
    )
    evaluator = ClassicalRetrievalEvaluator(k=[5, 10])
    results = await evaluator.evaluate(data)
    print(results)


if __name__ == "__main__":
    asyncio.run(main())

Example Output

Last updated

Was this helpful?