# Experiment Tracker

We provide `BaseExperimentTracker` and several ready-to-use experiment tracker classes. These can also be plugged into the `evaluate` function to log and record evaluation results, making it easier to analyze and share outcomes.

## Available Experiment Trackers

1. [SimpleExperimentTracker](#simpleexperimenttracker)
2. [LangfuseExperimentTracker](#langfuseexperimenttracker)
   1. [Refresh Langfuse Experiment Tracker](#refresh-langfuse-experiment-tracker)
   2. [Export Langfuse Experiment Results to CSV](#export-langfuse-experiment-results-to-csv)

***

### 🪶 SimpleExperimentTracker

**Use when:** You want a lightweight, local tracker that will log the results in CSV. It is great for quick tests, prototyping, or when you do not need a full UI.

Example usage:

```python
from gllm_evals.experiment_tracker.simple_experiment_tracker import SimpleExperimentTracker

tracker = SimpleExperimentTracker(
    project_name="my_project", 
    output_dir="./my_experiments"
)
tracker.log(...)
```

***

### 🌐 LangfuseExperimentTracker

**Use when:** You want a production-grade tracker integrated with Langfuse. It is great for detailed traces & spans, dataset & run management, and session & dataset level scoring.

Example usage:

```python
from langfuse import get_client

from gllm_evals.experiment_tracker.langfuse_experiment_tracker import LangfuseExperimentTracker

tracker = LangfuseExperimentTracker(
    project_name="my_project",
    langfuse_client=get_client(),
)
tracker.log(...)
```

#### New User Configuration

If you are new to Langfuse, you can follow these steps to use Langfuse Experiment Tracker:

1. **Open the Langfuse host**\
   Go to [**https://langfuse.obrol.id/**](https://langfuse.obrol.id/) and login using your GDP Labs account. This website is managed by BOSA team.
2. **Create an Organization**\
   Click the `New Organization` and enter the organization name on the `Organizations` page. This gives you a top-level space to manage projects and members. The organization can be filled with your company/team/client name. Use a human-readable name (e.g., `glchat`, `catapa`, `client-XZY`).
3. **Manage members**
   1. Invite teammates to the org/project with the roles you need (viewer/editor/admin).&#x20;
   2. **Important:** Set yourself (or one trusted person) as the **admin** so they can invite and manage other project members in the organization.
4. **Create a Project**\
   Experiments, datasets, traces, and API keys are project-scoped. Enter your project name to create the project. The project can be filled with your project/application name (e.g. `glchat-beta`).
5. **Create API credentials**\
   You can create API key now or later in your **Project → Settings → API Keys**, then generate keys and copy:
   1. **Public key**
   2. **Secret key**
   3. **Langfuse host**
6. **Configure your environment**\
   Most Langfuse clients (and our `evaluate()` integration) read these env vars:

   ```bash
   export LANGFUSE_HOST="https://langfuse.obrol.id"
   export LANGFUSE_PUBLIC_KEY="pk-xxxxxxxx"
   export LANGFUSE_SECRET_KEY="sk-xxxxxxxx"
   ```
7. **Run an evaluation with Langfuse tracking enabled**\
   With this credentials, you can now use the Langfuse Experiment Tracker.

***

#### What happens automatically

* **Auto-dataset creation (when needed).**\
  If you pass a dataset that does **not** already exist in Langfuse, we automatically create it with either the column `expected_response` will automatically be the ground truth response OR based on the given mapping dictionary. To see the mapping example, you can visit [this subsection](https://gdplabs.gitbook.io/sdk/~/revisions/beykCxz0UanaEX0sPJJu/tutorials/end-to-end-evaluation#using-langfuse-experiment-tracker-with-custom-mapping). You’ll find it under **Project → Datasets** in the left sidebar after a round of evaluation.
* **Experiment run logging.**\
  Your evaluation will be logged including runs, metrics/scores, and the underlying traces.

***

#### Where to see results (in Langfuse)

* **Datasets:** the dataset created/linked by `evaluate()`.\
  \&#xNAN;*Path:* Project → **Datasets**
* **Dataset runs:** executions over a dataset with per-item outputs and evaluator scores.\
  \&#xNAN;*Path:* Project → **Datasets** → select a dataset → **Runs**
* **Traces / Observations:** drill into individual calls/spans, inputs/outputs, timings.\
  \&#xNAN;*Path:* Project → **Traces** (and Observations)
* **Sessions:** grouped traces per experiment; you can review, share, and even score sessions.\
  \&#xNAN;*Path:* Project → **Sessions**.

#### Service window (our hosted Langfuse)

Langfuse will turn off automatically at 23.59 WIB. Contact evals or BOSA team for the Slack command to turn on/off the langfuse host.

***

#### :repeat: Refresh Langfuse Experiment Tracker

To refresh the scores in Langfuse after updating them, you can run using the following function:

```python
import asyncio
from langfuse import get_client

from gllm_evals.experiment_tracker.langfuse_experiment_tracker import LangfuseExperimentTracker
from glchat_sdk.evals.constant import GLChatDefaults

async def main():
    """The main function to run refresh langfuse session-level score"""

    run_id = "" # Fill this with the session id you want to refresh
    exp_tracker = LangfuseExperimentTracker(langfuse_client=get_client(), project_name=GLChatDefaults.PROJECT_NAME)
    exp_tracker.refresh_score(run_id=run_id)

if __name__ == "__main__":
    asyncio.run(main())
```

***

#### :file\_folder: Export Langfuse Experiment Results to CSV

You can export the Langfuse experiment results with all the updated scores to CSV using the following function:

```python
import asyncio
from langfuse import get_client

from gllm_evals.experiment_tracker.langfuse_experiment_tracker import LangfuseExperimentTracker
from glchat_sdk.evals.constant import GLChatDefaults

async def main():
    """The main function to export langfuse experiment results to CSV."""

    run_id = "" # Fill this with the session id you want to export
    exp_tracker = LangfuseExperimentTracker(langfuse_client=get_client(), project_name=GLChatDefaults.PROJECT_NAME)
    exp_tracker.export_experiment_results(run_id=run_id, export_type="csv")

if __name__ == "__main__":
    asyncio.run(main())
```
