🔢Metric

Metrics are the core evaluation components in the gllm-evals framework. They define specific ways to measure and assess the performance of language models for generation, retrieval systems, and agent behaviors.

Metrics work in conjunction with evaluators to provide comprehensive evaluation capabilities. Evaluators can run multiple metrics in parallel or sequentially, combining their results into a comprehensive evaluation report.

Example Usage

import asyncio
import os

from gllm_evals import load_simple_qa_dataset
from gllm_evals.metrics import GEvalCompletenessMetric

async def main():
    data = load_simple_qa_dataset()
    metric = GEvalCompletenessMetric(model_credentials=os.getenv("GOOGLE_API_KEY"))
    result = await metric.evaluate(data.dataset[0])
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

Available Metrics

Below are several existing metrics example. To view the full metrics list, see the Metricsarrow-up-right directory.

Last updated

Was this helpful?