Direct Preference Optimization (DPO)
What is Direct Preference Optimization (DPO)?
Installation
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"Quickstart
# Main Code
from gllm_training import DPOTrainer
from examples.llm_as_judge_reward_function import output_format_reward
dpo_trainer = DPOTrainer(
model_name="Qwen/Qwen3-0.6b",
datasets_path="examples/dpo_csv"
)
dpo_trainer.train()
Fine tuning model using YAML file.
Example 1: Fine tuning using online data.
Example 2: Fine tuning using local data.
Upload model to cloud storage
Last updated
Was this helpful?