📈Experiment Tracker

We provide BaseExperimentTracker and several ready-to-use experiment tracker classes. These can also be plugged into the evaluate function to log and record evaluation results, making it easier to analyze and share outcomes.

Available Experiment Trackers

🪶 SimpleExperimentTracker

Use when: You want a lightweight, local tracker that will log the results in CSV. It is great for quick tests, prototyping, or when you do not need a full UI.

Example usage:

from gllm_evals.experiment_tracker.simple_experiment_tracker import SimpleExperimentTracker

tracker = SimpleExperimentTracker(
    project_name="my_project", 
    output_dir="./my_experiments"
)
tracker.log(...)

🌐 LangfuseExperimentTracker

Use when: You want a production-grade tracker integrated with Langfuse. It is great for detailed traces & spans, dataset & run management, and session & dataset level scoring.

Example usage:

from langfuse import get_client

from gllm_evals.experiment_tracker.langfuse_experiment_tracker import LangfuseExperimentTracker

tracker = LangfuseExperimentTracker(
    project_name="my_project",
    langfuse_client=get_client(),
)
tracker.log(...)

New User Configuration

If you are new to Langfuse, you can follow these steps to use Langfuse Experiment Tracker:

Create an Organization Click the New Organization and enter the organization name on the Organizations page. This gives you a top-level space to manage projects and members. The organization can be filled with your company/team/client name. Use a human-readable name (e.g. client-XZY).
Manage members
1. Invite teammates to the org/project with the roles you need (viewer/editor/admin).
2. Important: Set yourself (or one trusted person) as the admin so they can invite and manage other project members in the organization.
Create a Project Experiments, datasets, traces, and API keys are project-scoped. Enter your project name to create the project. The project can be filled with your project/application name (e.g. project-abc).
Create API credentials You can create API key now or later in your Project → Settings → API Keys, then generate keys and copy:
1. Public key
2. Secret key
3. Langfuse host

Configure your environment Most Langfuse clients (and our evaluate() integration) read these env vars:

export LANGFUSE_HOST="https://langfuse.obrol.id"
export LANGFUSE_PUBLIC_KEY="pk-xxxxxxxx"
export LANGFUSE_SECRET_KEY="sk-xxxxxxxx"

Run an evaluation with Langfuse tracking enabled With this credentials, you can now use the Langfuse Experiment Tracker.

What happens automatically

Auto-dataset creation (when needed). If you pass a dataset that does not already exist in Langfuse, we automatically create it with either the column expected_response will automatically be the ground truth response OR based on the given mapping dictionary. To see the mapping example, you can visit this subsection. You’ll find it under Project → Datasets in the left sidebar after a round of evaluation.
Experiment run logging. Your evaluation will be logged including runs, metrics/scores, and the underlying traces.

Where to see results (in Langfuse)

Datasets: the dataset created/linked by evaluate(). Path: Project → Datasets
Dataset runs: executions over a dataset with per-item outputs and evaluator scores. Path: Project → Datasets → select a dataset → Runs
Traces / Observations: drill into individual calls/spans, inputs/outputs, timings. Path: Project → Traces (and Observations)
Sessions: grouped traces per experiment; you can review, share, and even score sessions. Path: Project → Sessions.

🔁 Refresh Langfuse Experiment Tracker

To refresh the scores in Langfuse after updating them, you can run using the following function:

import asyncio
from langfuse import get_client

from gllm_evals.experiment_tracker.langfuse_experiment_tracker import LangfuseExperimentTracker
from glchat_sdk.evals.constant import GLChatDefaults

async def main():
    """The main function to run refresh langfuse session-level score"""

    run_id = "" # Fill this with the session id you want to refresh
    exp_tracker = LangfuseExperimentTracker(langfuse_client=get_client(), project_name=GLChatDefaults.PROJECT_NAME)
    exp_tracker.refresh_score(run_id=run_id)

if __name__ == "__main__":
    asyncio.run(main())

📁 Export Langfuse Experiment Results to CSV

You can export the Langfuse experiment results with all the updated scores to CSV using the following function:

import asyncio
from langfuse import get_client

from gllm_evals.experiment_tracker.langfuse_experiment_tracker import LangfuseExperimentTracker
from glchat_sdk.evals.constant import GLChatDefaults

async def main():
    """The main function to export langfuse experiment results to CSV."""

    run_id = "" # Fill this with the session id you want to export
    exp_tracker = LangfuseExperimentTracker(langfuse_client=get_client(), project_name=GLChatDefaults.PROJECT_NAME)
    exp_tracker.export_experiment_results(run_id=run_id, export_type="csv")

if __name__ == "__main__":
    asyncio.run(main())

PreviousDataset NextAttachments

Last updated 22 days ago

Was this helpful?

hashtagAvailable Experiment Trackers

hashtag🪶 SimpleExperimentTracker

hashtag🌐 LangfuseExperimentTracker

hashtagNew User Configuration

hashtagWhat happens automatically

hashtagWhere to see results (in Langfuse)

hashtag🔁 Refresh Langfuse Experiment Tracker

hashtag📁 Export Langfuse Experiment Results to CSV