π―Evaluator / Scorer
Input & Output Types
{
"query": "What is the capital of France?",
"retrieved_context": "Paris",
"expected_response": "New York",
"generated_response": "Paris is the capital of France."
}{
"generation": {
"global_explanation": "The following metrics failed to meet expectations:\n1. Completeness is 1 (should be 3)\n2. Groundedness is 1 (should be 3)",
"relevancy_rating": "bad",
"score": 0.0,
"possible_issues": ["Retrieval Issue", "Generation Issue"],
"completeness": {
"score": 1,
"explanation": "The response contains a critical factual contradiction. It identifies Barcelona as the capital of Spain, whereas the expected output correctly states that the capital is Madrid.",
},
"groundedness": {
"score": 1,
"explanation": "The response provides a factually incorrect answer that directly contradicts the retrieval context, which explicitly states that Madrid is the capital of Spain.",
},
}
}
Single vs Batch Evaluation
Initialization & Common Parameters
Example Usage β Using OpenAICompatibleLMInvoker
Available Evaluators
Last updated
Was this helpful?