# Dataset

We provide a `BaseDataset` class as the foundation, and several ready-to-use dataset types. These make it simple to load data from different sources in a unified way. This dataset object also can be passed to the `evaluate` function that will be used for end-to-end evaluation.

***

## Available Datasets

1. [DictDataset](#dictdataset)
2. [HuggingFaceDataset](#huggingfacedataset)
3. [SpreadsheetDataset](#spreadsheetdataset)
4. [LangfuseDataset](#langfusedataset)

***

### 📖 DictDataset

**Use when:** You want to store your dataset directly in a **list of** **dictionary format**.

It can be created from JSONL or CSV.

Example usage:

```python
from gllm_evals.dataset.dict_dataset import DictDataset

csv_path = "path/to/csv/data"
data: DictDataset = DictDataset.from_csv(csv_path)
```

***

### 🤗 HuggingFaceDataset

**Use when:** You want to load datasets directly from the HuggingFace Hub or from a Python list.

Example usage:

```python
from datasets import load_dataset

from gllm_evals.dataset.hf_dataset import HuggingFaceDataset

hf_dataset_path = "path/to/hf/dataset"
data: HuggingFaceDataset = HuggingFaceDataset(
    dataset=load_dataset(path=path_or_name, split=split, **kwargs)
)
```

***

### 📝 SpreadsheetDataset

**Use when:** You want to load datasets from Google Sheets.

Example usage:

```python
from gllm_evals.dataset.spreadsheet_dataset import SpreadsheetDataset

data: SpreadsheetDataset = await SpreadsheetDataset.from_gsheets(
    sheet_id="sheet-id",
    worksheet_name="worksheet-name",
    client_email=os.getenv("GOOGLE_SHEETS_CLIENT_EMAIL"),
    private_key=os.getenv("GOOGLE_SHEETS_PRIVATE_KEY"),
)
```

***

### 📊 LangfuseDataset

**Use when:** You want to manage datasets in Langfuse or want to import from multiple formats (from Langfuse itself, dictionary, google sheets, CSV, JSONL).

Example usage:

```python
from langfuse import get_client

from gllm_evals.dataset.langfuse_dataset import LangfuseDataset

data: LangfuseDataset = LangfuseDataset.from_langfuse(
    langfuse_client=get_client(),
    dataset_name="dataset-name"
)
```
