🚀Getting Started

Introduction

This tutorial will guide you step-by-step on how to install the GenAI Evaluator SDK and run your first evaluation.

Prerequisites

Before installing, make sure you have:

Python 3.11+
Pip or Poetry
OpenAI API Key
gcloud CLI - required because gllm-evals is a private library hosted in a private Google Cloud repository

After installing, please run

gcloud auth login

to authorize gcloud to access the Cloud Platform with Google user credentials.

Our internal gllm-evals package is hosted in a secure Google Cloud Artifact Registry. You need to authenticate via gcloud CLI to access and download the package during installation.

Installation

Run the following command to install

pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-evals[deepeval,langchain,ragas]"

Step 1: Add the gen-ai-internal source to your pyproject.toml

poetry source add gen-ai-internal "https://asia-southeast2-python.pkg.dev/gdp-labs/gen-ai-internal/simple/" --priority supplemental

Step 2: Configure the authentication

poetry config http-basic.gen-ai-internal oauth2accesstoken "$(gcloud auth print-access-token)"

Step 3: Add to projects

poetry add "gllm-evals[deepeval,langchain,ragas]"

Step 1: Add the gen-ai-internal source to your pyproject.toml

poetry source add --priority=explicit gen-ai https://glsdk.gdplabs.id/gen-ai/simple

Step 2: Configure the authentication

poetry config http-basic.gen-ai oauth2accesstoken "$(gcloud auth print-access-token)"

Step 3: Add to projects

poetry add --source gen-ai "gllm-evals-binary[deepeval,langchain,ragas]"

Environment Setup

Set a valid language model credential as an environment variable.

In this example, let's use an OpenAI API key.

Get an OpenAI API key from OpenAI Console.

export OPENAI_API_KEY="sk-..."

Running Your First Evaluation

In this tutorial, we will evaluate RAG pipeline output.

Create a script called eval.py

import asyncio
import os
from gllm_evals.evaluator.geval_generation_evaluator import GEvalGenerationEvaluator
from gllm_evals.types import RAGData

async def main():
    evaluator = GEvalGenerationEvaluator(
        model="openai/gpt-4.1",
        model_credentials=os.getenv("OPENAI_API_KEY")
    )

    data = RAGData(
        query="What is the capital of France?",
        expected_response="Paris",
        generated_response="New York",
        retrieved_context="Paris is the capital of France.",
    )

    result = await evaluator.evaluate(data)
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

Run the script

python eval.py

The evaluator will generate a response for the given input, e.g.:

{
    'geval_generation_evals': {
        'relevancy_rating': 'bad',
        'possible_issues': ['Retrieval Issue', 'Generation Issue'],
        'score': 0,
        'completeness': {
            'score': 1,
            'explanation': "The expected output contains one substantive statement: 'Paris' as the capital of France. The actual output, 'New York', does not match this statement and is a critical error regarding the key information."
        },
        'groundedness': {
            'score': 3,
            'explanation': "The response 'New York' is not mentioned in the retrieved context and is factually incorrect since the context clearly states that Paris is the capital of France. This is a critical factual mistake, rendering the answer fully unsupported by the context or the question."
        },
        'redundancy': {
            'score': 1,
            'explanation': 'The response provides a single, incorrect answer without any repetition, restatement, or elaboration. Only one idea is presented, making it concise and to the point, regardless of accuracy.'
        }
    }
}

Congratulations! You have successfully run your first evaluation

Recommendation

If you want to run an end-to-end evaluation, use the evaluate() convenience function instead of the step-by-step commands above.

It will automatically handle experiment tracking (via the Experiment Tracker) and integrates results into your existing Dataset, so you don’t have to wire these pieces together manually.

Next Steps

You're now ready to start using our evaluators. We offer several prebuilt evaluators to get you started:

Looking for something else? Build your own custom evaluator here.

^{*All fields are optional and can be adjusted depending on the chosen metric.}

PreviousEvaluator NextEnd-to-End Evaluation

Last updated 3 months ago

hashtagIntroduction

hashtagInstallation

hashtagEnvironment Setup

hashtagGet an OpenAI API key from OpenAI Consolearrow-up-right.

hashtagRunning Your First Evaluation

hashtagRecommendation

hashtagNext Steps

Introduction

Installation

Environment Setup

Get an OpenAI API key from OpenAI Console.

Running Your First Evaluation

Recommendation

Next Steps