📚Evaluate GLChat Tutorial

In this guide, we will learn how to use evaluate_glchat to generate GLChat message response and evaluate its performance for QA dataset evaluation.

The evaluation focuses on question-answering capabilities with support for web search capability and PII handling. The dataset and experiment results then can be accessed in Langfuse for monitoring. To view more details on each component, you can click them in the sidebar inside the Evaluation page.

Prerequisites

Before you can start evaluating GLChat QA dataset, you need to prepare the following:

Required Parameters

1. User ID (user_id)

The user_id is a unique identifier for the user who will be interacting with the specified GLChat application. This information is needed to create a conversation or message.

Where to get it:

From your existing user in GLChat: If you already have an existing user in your GLChat application, you can use it as the user_id.

You can also provide any user that has access to the application you want to test.

2. Chatbot ID (chatbot_id)

The chatbot_id identifies which chatbot or application configuration to use for the conversation. This information is needed to create a conversation or message.

3. [Optional] Model Name (model_name)

The model_name specifies which language model to use for generating GLChat response. Model name can be filled with the model display name in an application / chatbot. If not specified, the response will be generated using the default model there.

Required Keys

For GLChat

We will also need access to the GLChat credentials to generate the response. Please contact GLChat team if you do not have them yet. The required keys are: GLCHAT_BASE_URL and GLCHAT_API_KEY.

For Langfuse

We will need Langfuse credentials to trace, debug, and view the evaluation results for our GLChat QA system. If you do not have any Langfuse credentials yet, you can follow the New User Configuration to get them. The required keys are: LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and LANGFUSE_HOST.

Step 0: Install the Required Libraries

We need to install the required libraries for GLChat evaluation, including GLChat SDK and Langfuse.

Install GLChat SDK with evals

The evals module inside glchat-sdk is currently private and requires special access. To use the evaluation functionality, you need to install the package with the evals extra.

Using poetry

# Add the private repository
poetry source add --priority=explicit gen-ai https://glsdk.gdplabs.id/gen-ai/simple

# Configure authentication
poetry config http-basic.gen-ai oauth2accesstoken "$(gcloud auth print-access-token)"

# Install with evals dependency group (uses Poetry's dependency groups with source configuration)
poetry install --with evals

Using pip

# Install using Google Cloud access token
pip install glchat-sdk[evals] --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai/simple

gllm-evals and langfuse should have been included when you install the glchat-sdk with evals extra.

Step 1: Setup Environment and Configuration

Prepare the environment variables for the evaluation script:

.env

# GLChat
GLCHAT_BASE_URL=
GLCHAT_API_KEY=

# Langfuse
LANGFUSE_PUBLIC_KEY=
LANGFUSE_SECRET_KEY=
LANGFUSE_HOST=

With the environment variables set, we can now verify and use GLChat SDK and Langfuse.

import os

from langfuse import get_client

# Initialize Langfuse client
langfuse = get_client()
 
# Verify Langfuse connection
if langfuse.auth_check():
    print("Langfuse client is authenticated and ready!")
else:
    print("Authentication failed. Please check your credentials and host.")

# Verify if GLChat env vars exist and has a non-empty value
GLCHAT_BASE_URL = os.getenv("GLCHAT_BASE_URL")
GLCHAT_API_KEY = api_key=os.getenv("GLCHAT_API_KEY")
if GLCHAT_BASE_URL and GLCHAT_API_KEY:
    print("GLCHAT_BASE_URL and GLCHAT_API_KEY variables are available!")
else:
    print("GLCHAT_BASE_URL and GLCHAT_API_KEY variables are not set or empty.")

Step 2: Prepare Your Dataset

Before we can evaluate, we need to prepare a dataset with all the information needed for evaluation.

To ensure compatibility, your dataset must use standardized column names. We enforce a strict naming convention so the module can automatically recognize and process your data correctly.

Before using the module, please make sure your dataset columns match the required names exactly (case sensitive).

Column Names

Description

Is Required?

question_id

Unique identifier for each query.

query

The question to ask.

expected_response

The expected answer to be compared.

search_type

("normal" or "search")

Whether to enable search functionality in GLChat.

All rows will be set to "normal" (no search capability) if the column is not stated.

enable_pii

(True or False)

Whether to enable PII processing.

All rows will be set to False (no pii masking) if the column is not stated.

model_name

The model to be used for response generation for each row in GLChat.

If the column is not stated, it will check the provided config for the global configuration. If it is also not stated, it will use the default model based on the provided chatbot.

chatbot_id

The chatbot id to be used for response generation for each row in GLChat.

If the column is not stated, it will check the provided config for the global configuration.

attachments

The file names to be used for each row. Left empty for rows not using any attachments.

This column is mandatory ONLY if you have attachment(s) to be used for response generation. To see more details, you can visit the Attachments page.

Other additional fields

Any additional fields you deem necessary to be included. Will not affect the evaluation process.

For example purpose, you can download the following CSV file:

1KB

glchat_qa_data.csv

Open

Step 3: Instrument your GLChat Configuration

Before we can evaluate our GLChat system, we need to create a GLChat configuration using GLChatConfig to set what configuration to use.

🟢 Minimum Configuration (Bare Minimum)

Use this when you just want to get started fast with default settings.

from glchat_sdk.evals import GLChatConfig

config = GLChatConfig(
    base_url="your-glchat-base-url",  # can also be put in `GLCHAT_BASE_URL` env var
    api_key="your-api-key",  # recommended to be put in `GLCHAT_API_KEY` env var
    chatbot_id="your-chatbot-id",
    username="your-chatbot-username"
)

That’s all you need — the rest will be handled by the module using defaults.

🔵 Full Configuration (Complete Example)

Use this version if you want full control over every parameter and behavior.

from glchat_sdk.evals import GLChatConfig

config = GLChatConfig(
    base_url="your-glchat-base-url",  # can also be put in `GLCHAT_BASE_URL` env var
    api_key="your-api-key",  # recommended to be put in `GLCHAT_API_KEY` env var
    chatbot_id="your-chatbot-id",
    username="your-chatbot-username",
    model_name="global-generation-model",  # the bigger priority is the one in the dataset
    enable_pii="global-enable-pii", # the bigger priority is the one in the dataset
    search_type="global-search-type",  # the bigger priority is the one in the dataset
    include_states="states-in-streaming-response"  # recommended to be `True` as default
    expiry_days="expiry-days-for-shared-conv"  # recommended to be `None` as default
)

💡 Tip: Start with the minimal config, and gradually add more if you need more customization. If the config parameter is also available the dataset column, it will prioritize the one in the dataset column.

Step 4: Prepare Attachments (Optional)

If your dataset does not need any attachment, feel free to skip this step. To find out more about what type of attachments currently supported and how to set the attachments configuration based on each type, you can visit the Attachments page.

In this example, we use the local attachment as the simplest setup. Regardless of the attachment type you choose, ensure that your files are already stored in a storage location we currently support.

For this dataset example, you can download the file and put it in your local directory:

16KB

gambar kartini.jpg

image

Open

After that, you can now create a local attachment configuration. For example, if you put the above image to the local path /home/user/Documents/files/gambar kartini.jpg, then you can add the following parameter based on the attachment type:

from gllm_evals.types import LocalAttachmentConfig

attachments_config = LocalAttachmentConfig(
    local_directory="home/user/Documents/files"  # put the directory path
)

Step 5: Perform end-to-end Evaluation

To run the end-to-end evaluation, we can use a convenience function in glchat-sdk called evaluate_glchat. This function provides a streamlined interface for evaluating GLChat models using the existing gllm-evals framework. It eliminates the need to manually implement inference functions by providing a pre-built GLChat integration.

We can pass the dataset, the GLChat configuration, and the attachment config to the evaluate_glchat() function. In this example, we will use GEvalGenerationEvaluator which is suitable for evaluating QA dataset. Since we want to use Langfuse, we will also use LangfuseExperimentTracker as the dedicated experiment tracker.

import asyncio
import json
import os

from dotenv import load_dotenv
from langfuse import get_client

from gllm_evals.evaluator.geval_generation_evaluator import GEvalGenerationEvaluator
from gllm_evals.experiment_tracker.langfuse_experiment_tracker import LangfuseExperimentTracker
from gllm_evals.types import LocalAttachmentConfig

from glchat_sdk.evals import GLChatConfig, evaluate_glchat


async def main():
    """Main function example."""
    
    # The GLChat configuration we have created above
    config = GLChatConfig(
        base_url="your-glchat-base-url",
        api_key="your-api-key",
        chatbot_id="your-chatbot-id",
        username="your-chatbot-username"
    )
    
    # The attachment configuration we have created above
    attachments_config = LocalAttachmentConfig(
        local_directory="home/user/Documents/files"
    )
    
    # Call the `evaluate_glchat`
    results = await evaluate_glchat(
        data="path/to/glchat_qa_data.csv",
        evaluators=[
            GEvalGenerationEvaluator(
                model="google/gemini-2.5-pro",
                model_credentials=os.getenv("GOOGLE_API_KEY"),
                name="generation",
            )
        ],
        experiment_tracker=LangfuseExperimentTracker(
            langfuse_client=get_client(),
        ),
        config=config,
        attachments_config=attachments_config,
    )
    print(results)


if __name__ == "__main__":
    asyncio.run(main())

Step 6: View Evaluation Results in Langfuse

After running the evaluation, all the dataset and experiment you've provided will automatically be logged to Langfuse. This step shows you how to navigate the Langfuse UI and interpret your evaluation results.

Accessing Your Langfuse Dashboard

Navigate to your Langfuse project: Go to https://langfuse.obrol.id/.
Select your organization and project: Choose your dedicated organization and project (or that you've just created).
Access the dashboard: You'll see various sections for analyzing your data.

View Dataset

To view the dataset you've just created, you can go to: Project → Datasets → select a dataset → Items. In this page, we can see all the data rows you have just evaluated based on the provided Langfuse mapping. This dataset can also be reused for future evaluation.

To see more detail for each row, you can click one of the data item above.

View Dataset Runs (Leaderboard)

Dataset runs are the executions over a dataset with per-item output. A dataset run represents an experiment. To view the dataset runs, you can go to: Project → Datasets → select a dataset → Runs. In here, you can view all the scores for each experiment, including LLM-as-a-judge score columns—both as aggregations and per-row values.

You can also click a specific dataset run to view all the data rows result for each experiment:

View Traces / Observations

Trace / observation let you drill into individual spans, view the inputs, outputs (our evaluation results), and metadata. You can go to: Project → Traces.

Below is the trace example:

View Sessions (Experiment Results)

Sessions contain grouped traces per experiment; you can review and annotate each data trace in sessions. You can access the sessions in Project → Sessions.

Below is the session screenshot example:

Congratulation! You have just created your first GLChat QA Evaluation!

Export to CSV

You also can optionally export the experiment results in Langfuse to CSV by running the following metrics:

import asyncio
from langfuse import get_client

from gllm_evals.experiment_tracker.langfuse_experiment_tracker import LangfuseExperimentTracker
from glchat_sdk.evals.constant import GLChatDefaults

async def main():
    """The main function to export langfuse experiment results to CSV."""

    run_id = "" # Fill this with the session id you want to export
    exp_tracker = LangfuseExperimentTracker(langfuse_client=get_client(), project_name=GLChatDefaults.PROJECT_NAME)
    exp_tracker.export_experiment_results(run_id=run_id)

if __name__ == "__main__":
    asyncio.run(main())

Conclusion

This cookbook provides a simple guide to evaluating GLChat QA systems using Langfuse. By following these steps, you can:

Monitor your QA system's performance
Evaluate different models and configurations systematically
Track quality metrics and identify improvement opportunities
Ensure reliable and high-quality QA responses in production

Note: This is a simple guide to get you started with GLChat QA evaluation using Langfuse. For more comprehensive evaluation information and advanced techniques, please refer to the evaluation gitbook.

Last updated 23 days ago

Was this helpful?

hashtagPrerequisites

hashtagRequired Parameters

hashtagRequired Keys

hashtagFor GLChat

hashtagFor Langfuse

hashtagStep 0: Install the Required Libraries

hashtagInstall GLChat SDK with evals

hashtagUsing poetry

hashtagUsing pip

hashtagStep 1: Setup Environment and Configuration

hashtagStep 2: Prepare Your Dataset

hashtagStep 3: Instrument your GLChat Configuration

hashtag🟢 Minimum Configuration (Bare Minimum)

hashtag🔵 Full Configuration (Complete Example)

hashtagStep 4: Prepare Attachments (Optional)

hashtagStep 5: Perform end-to-end Evaluation

hashtagStep 6: View Evaluation Results in Langfuse

hashtagAccessing Your Langfuse Dashboard

hashtagView Dataset

hashtagView Dataset Runs (Leaderboard)

hashtagView Traces / Observations

hashtagView Sessions (Experiment Results)

hashtagExport to CSV

hashtagConclusion

Prerequisites

Required Parameters

Required Keys

For GLChat

For Langfuse

Step 0: Install the Required Libraries

Install GLChat SDK with evals

Using poetry

Using pip

Step 1: Setup Environment and Configuration

Step 2: Prepare Your Dataset

Step 3: Instrument your GLChat Configuration

🟢 Minimum Configuration (Bare Minimum)

🔵 Full Configuration (Complete Example)

Step 4: Prepare Attachments (Optional)

Step 5: Perform end-to-end Evaluation

Step 6: View Evaluation Results in Langfuse

Accessing Your Langfuse Dashboard

View Dataset

View Dataset Runs (Leaderboard)

View Traces / Observations

View Sessions (Experiment Results)

Export to CSV

Conclusion