Group Relative Policy Optimization (GRPO)
What is Group Relative Policy Optimization (GRPO)?
Installation
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-training"Quickstart
# Main Code
from gllm_training import GRPOTrainer
from examples.reward_function.llm_as_judge_reward_function import output_format_reward
grpo_trainer = GRPOTrainer(
model_name="Qwen/Qwen3-0.6b",
datasets_path="examples/grpo_csv",
reward_functions=[output_format_reward]
)
grpo_trainer.train()
Fine tuning model using YAML file.
Example 1: Fine tuning using online data.
Example 2: Fine tuning using local data.
Datasets Format
Data training and validation
Column Name
Description
Prompts
Column Name
Description
Logging Monitoring
JSONL Logs (Structured Training Metrics)
Tensorboard Logs (Visual Monitoring)
Log Configuration
Best Practices
Upload model to cloud storage
1
2
Last updated
Was this helpful?