🎯Evaluator / Scorer

An Evaluator orchestrates evaluation workflows by coordinating metrics and evaluation logic. It acts as a manager that:

  • Executes relevant metrics

  • Aggregates and formats results

  • Generates human-readable explanations

All evaluators inherit from BaseEvaluator, an abstract base class that defines the core evaluation interface.

Input & Output Types

Evaluator accepts dictionary containing data to evaluate which takes from MetricInput . There are also several data types that are already created such as QAData, RAGData, and AgentData .

Example Input

{
  "query": "What is the capital of France?",
  "retrieved_context": "Paris",
  "expected_response": "New York",
  "generated_response": "Paris is the capital of France."
}

While Evaluator outputs an EvaluationOutput that includes several keys such as global_explanation , score, and namespaced metrics result.

Example Output

{
  "generation": {
    "global_explanation": "The following metrics failed to meet expectations:\n1. Completeness is 1 (should be 3)\n2. Groundedness is 1 (should be 3)",
    "relevancy_rating": "bad",
    "score": 0.0,
    "possible_issues": ["Retrieval Issue", "Generation Issue"],
    "completeness": {
      "score": 1,
      "explanation": "The response contains a critical factual contradiction. It identifies Barcelona as the capital of Spain, whereas the expected output correctly states that the capital is Madrid."
    },
    "groundedness": {
      "score": 1,
      "explanation": "The response provides a factually incorrect answer that directly contradicts the retrieval context, which explicitly states that Madrid is the capital of Spain."
    }
  }
}

Single vs Batch Evaluation

Evaluators support both modes via the same evaluate() method:

Single Evaluation

Batch Evaluation

Initialization & Common Parameters

All evaluators accept:

  • model: str | BaseLMInvoker

    • Use a string for quick setup (e.g., "openai/gpt-4o-mini", "anthropic/claude-3-5-sonnet"), or

    • Pass a BaseLMInvoker instance for more advanced configuration. See Language Model (LM) Invokerarrow-up-right for more details and supported invokers.

Example Usage β€” Using OpenAIChatCompletionsLMInvoker

circle-info

OpenAICompatibleLMInvoker was removed in v0.6. Use OpenAIChatCompletionsLMInvoker with a base_url parameter to connect to OpenAI Chat Completions API-compatible providers.

Available Evaluators


Looking for something else? Build your own custom evaluator here.

*All fields are optional and can be adjusted depending on the chosen metric.

Last updated

Was this helpful?