Direct Preference Optimization (DPO)
What is Direct Preference Optimization (DPO)?
Installation
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"Quickstart
# Main Code
from gllm_training import DPOTrainer
dpo_trainer = DPOTrainer(
model_name="Qwen/Qwen3-0.6b",
datasets_path="examples/dpo_csv"
)
dpo_trainer.train()
Fine tuning model using YAML file.
Example 1: Fine tuning using online data.
Example 2: Fine tuning using local data.
Datasets Format
Data training and validation
Column Name
Description
Prompts
Column Name
Description
Logging Monitoring
JSONL Logs (Structured Training Metrics)
TensorBoard Logs (Visual Monitoring)
Log Configuration
Best Practices
Upload model to cloud storage
1
2
Last updated
Was this helpful?