person-chalkboardDirect Preference Optimization (DPO)

circle-info

The DPO fine-tuning is currently under pull-request review, so this technique is not yet available for use in the SDK.

What is Direct Preference Optimization (DPO)?

Direct Preference Optimization (DPO) is a preference-based fine-tuning technique that aligns a model using paired comparisons between responses, rather than relying on reinforcement learning or reward models. For each input prompt, DPO uses a chosen response (preferred) and a rejected response (less preferred) to directly increase the likelihood of generating the chosen output while decreasing the likelihood of the rejected one. This is achieved through a closed-form optimization objective that simplifies training while still capturing preference signals effectively. DPO is particularly useful when you have datasets that express relative human preferences, and it typically produces stable, efficient, and preference-aligned model behaviors.

chevron-rightPrerequisiteshashtag

Before installing, make sure you have:

  1. gcloud CLIarrow-up-right - required because gllm-training is a private library hosted in a private Google Cloud repository

After installing, please run

gcloud auth login

to authorize gcloud to access the Cloud Platform with Google user credentials.

circle-info

Our internal gllm-training package is hosted in a secure Google Cloud Artifact Registry. You need to authenticate via gcloud CLI to access and download the package during installation.

  1. The minimum requirements:

    1. CUDA-compatible GPU

    2. Recommendation GPU:

      1. RTX A5000

      2. RTX 40/50 series.

    3. Windows/Linux, currently not support for macOS

Installation

pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"

Quickstart

Let's move on to a basic example fine-tuned using DPOTrainer. To run DPO fine-tuning, you need to specify a model name, dpo_column_mapping and dataset path. Make sure your data sets contained of prompt, chosen as a correct response and rejected as a rejected response.

# Main Code
from gllm_training import DPOTrainer
from examples.llm_as_judge_reward_function import output_format_reward

dpo_trainer = DPOTrainer(
    model_name="Qwen/Qwen3-0.6b",
    datasets_path="examples/dpo_csv"
)
dpo_trainer.train()

Fine tuning model using YAML file.

We can run experiments in a more structured way by using a YAML file. The current DPO fine-tuning SDK supports both online data from Google Spreadsheets and local data in CSV format.

Example 1: Fine tuning using online data.

We can prepared our experiment using YAML file with the data trained and validation from google spreadsheet.

1

Configure environment variables (.env)

Fill in the GOOGLE_SHEETS_CLIENT_EMAIL and GOOGLE_SHEETS_PRIVATE_KEY fields. If you don’t have these keys, please contact the infrastructure team.

2

Share the spreadsheet

Share your Google Spreadsheet containing the training and validation data with the GOOGLE_SHEETS_CLIENT_EMAIL.

3

Experiment configuration (dpo_experiment_config.yml)

You can use a YAML file to plan your fine tuning experiments. To fine tuning with YAML, you need to define the required variables in the file.

4

Fine tuning

To run your fine-tuning, you need to load the YAML data using the YamlConfigLoader function, and select the experiment ID when executing the load function.

Example 2: Fine tuning using local data.

The remaining hyperparameter configurations for fine-tuning are the same as when using online data. Below is an example YAML configuration for using local data for training and validation.

Upload model to cloud storage

When running experiments, we don’t always save the model directly to the cloud. Instead, we may first evaluate its performance before uploading it to cloud storage. To support this workflow, we provide a save_model function that allows you to upload the model as a separate step after fine tuning.

1

Configure environment variable (.env)

Fill in the AWS_ACCESS_KEY, AWS_SECRET_KEY and AWS_REGION fields. If you don’t have these keys, please contact the infrastructure team.

2

Upload model

To upload the model, you need to configure the storage configuration and specify the model path on save_model function. The model path should point to the directory of your best adapter model.

Last updated