🚀Getting Started

Introduction

This tutorial will guide you step-by-step on how to install the GenAI Evaluator SDK and run your first evaluation.

Prerequisites

Before installing, make sure you have:

Python 3.11+
Pip or Poetry
OpenAI API Key
gcloud CLI - required because gllm-evals is a private library hosted in a private Google Cloud repository

After installing, please run

gcloud auth login

to authorize gcloud to access the Cloud Platform with Google user credentials.

Our internal gllm-evals package is hosted in a secure Google Cloud Artifact Registry. You need to authenticate via gcloud CLI to access and download the package during installation.

Installation

Run the following command to install

pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-evals[deepeval,langchain,ragas]"

Step 1: Add the gen-ai-internal source to your pyproject.toml

poetry source add gen-ai-internal "https://asia-southeast2-python.pkg.dev/gdp-labs/gen-ai-internal/simple/" --priority supplemental

Step 2: Configure the authentication

poetry config http-basic.gen-ai-internal oauth2accesstoken "$(gcloud auth print-access-token)"

Step 3: Add to projects

poetry add "gllm-evals[deepeval,langchain,ragas]"

Environment Setup

Set a valid language model credential as an environment variable.

In this example, let's use an OpenAI API key.

Get an OpenAI API key from OpenAI Console.

export OPENAI_API_KEY="sk-..."

$env:OPENAI_API_KEY = "sk-..."

set OPENAI_API_KEY="sk-..."

Running Your First Evaluation

In this tutorial, we will evaluate RAG pipeline output.

Create a script called eval.py

import asyncio
import os
from gllm_evals.evaluator.geval_generation_evaluator import GEvalGenerationEvaluator
from gllm_evals.types import RAGData

async def main():
    evaluator = GEvalGenerationEvaluator(
        model="openai/gpt-4.1",
        model_credentials=os.getenv("OPENAI_API_KEY")
    )

    data = RAGData(
        query="What is the capital of France?",
        expected_response="Paris",
        generated_response="New York",
        retrieved_context="Paris is the capital of France.",
    )

    result = await evaluator.evaluate(data)
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

Run the script

python eval.py

The evaluator will generate a response for the given input, e.g.:

{
    'geval_generation_evaluator': {
        'relevancy_rating': 'bad',
        'possible_issues': ['Retrieval Issue', 'Generation Issue'],
        'score': 0,
        'completeness': {
            'score': 1,
            'reason': "The expected output contains one substantive statement: 'Paris' as the capital of France. The actual output, 'New York', does not match this statement and is a critical error regarding the key information."
        },
        'groundedness': {
            'score': 3,
            'reason': "The response 'New York' is not mentioned in the retrieved context and is factually incorrect since the context clearly states that Paris is the capital of France. This is a critical factual mistake, rendering the answer fully unsupported by the context or the question."
        },
        'redundancy': {
            'score': 1,
            'reason': 'The response provides a single, incorrect answer without any repetition, restatement, or elaboration. Only one idea is presented, making it concise and to the point, regardless of accuracy.'
        }
    }
}

Congratulations! You have successfully run your first evaluation

Next Steps

You're now ready to start using our evaluators. We offer several prebuilt evaluators to get you started:

Looking for something else? Build your own custom evaluator here.

PreviousModel Evaluation NextCreate Custom Evaluator

Last updated 18 days ago