🛠️Create Custom Evaluator / Scorer

If the built-in evaluators don’t cover your use case, you can define your own! There are two main ways to create a custom evaluator:

1. Implement Custom Metric and Custom Evaluator

If you need a highly customized evaluation logic, you can create your own class by extending BaseEvaluator and defining your evaluation logic from scratch.

Example Usage

import asyncio

from gllm_evals.dataset import load_simple_rag_dataset
from gllm_evals.evaluator.evaluator import BaseEvaluator
from gllm_evals.metrics.metric import BaseMetric
from gllm_evals.types import MetricInput, MetricOutput, EvaluationOutput

class ExactMatchMetric(BaseMetric):
    def __init__(self):
        self.name = "exact_match"

    async def _evaluate(self, data: MetricInput) -> MetricOutput:
        score = int(data["generated_response"] == data["expected_response"])
        return {"score": score, "explanation": None}

class ResponseEvaluator(BaseEvaluator):
    def __init__(self):
        super().__init__(name="response_evaluator")
        self.metric = ExactMatchMetric()

    async def _evaluate(self, data: MetricInput) -> EvaluationOutput:
        return await self.metric.evaluate(data)

async def main():
    evaluator = ResponseEvaluator()
    data = load_simple_rag_dataset()
    result = await evaluator.evaluate(data[0])
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

Example Output

2. Combining Existing Metrics with CustomEvaluator

The easiest way to build your own evaluator is by combining any set of metrics into a CustomEvaluator. You can mix and match built-in metrics to tailor evaluation to your needs.

Example Usage

Example Output

Which Method Should You Use?

Use Case
Recommended Method

Implement custom logic

Extend BaseEvaluator

Combine existing metrics

CustomEvaluator

circle-check

Last updated

Was this helpful?