Translation
This guide explains translation and how to use gllm-intl to localize your applications with message catalogs and multiple locales.
Installation
# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-intl# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-intlFOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-intlWhat is Translation?
Translation in the context of internationalization (i18n) is the process of adapting text, messages, and content from one language to another while preserving meaning and context. Unlike transliteration (which converts sounds between scripts), translation converts the meaning and intent of messages.
Key Concepts:
Message ID (msgid): A unique key identifying a translatable message (e.g.,
"greeting")Message String (msgstr): The translated text for a specific locale (e.g.,
"Hello"in English,"Halo"in Indonesian)Locale: A language and regional identifier (e.g.,
en_USfor US English,id_IDfor Indonesian)Context: Additional information to disambiguate identical keys with different meanings
Pluralization: Different message forms based on quantity (e.g., "1 item" vs "2 items")
Examples:
Message ID
English (en_US)
Indonesian (id_ID)
French (fr_FR)
greeting
Hello
Halo
Bonjour
welcome_user
Welcome, {name}!
Selamat datang, {name}!
Bienvenue, {name}!
item (singular)
1 item
1 item
1 article
item (plural)
{count} items
{count} item
{count} articles
The gllm-intl library uses GNU gettext format via Babel for industry-standard translation catalog management.
What is a Translation Catalog?
A translation catalog is a structured database of translated messages organized by locale. Catalogs use the gettext format, an industry standard for software localization.
Catalog File Types:
.po(Portable Object) - Human-readable source file:.mo(Machine Object) - Compiled binary file used at runtime (generated from.po)
Catalog Structure:
Catalog Features:
Simple Messages
Basic key-value translations
msgid "greeting" → msgstr "Hello"
Variable Interpolation
Dynamic content injection
"Welcome, {name}!" with name="Alice"
Plural Forms
Quantity-aware translations
msgid_plural "items" with count rules
Contextual Messages
Disambiguate identical keys
msgctxt "file" vs msgctxt "person"
Metadata
Language, encoding, plural rules
"Plural-Forms: nplurals=2; plural=(n != 1);"
Why Use Translation Catalogs?
1. Multi-Language Support
Serve users in their preferred language:
2. Centralized Management
Separate translatable content from code:
3. Professional Translation Workflow
Enable translators to work independently:
4. Plural Form Handling
Handle language-specific pluralization rules automatically:
5. Context Disambiguation
Differentiate identical words with different meanings:
Quick Start
Translate in 3 steps:
Setting Up Translation Directory Structure
The gllm-intl library uses Babel-style gettext catalogs for storing translations. Each locale requires a specific directory structure.
Directory Structure
Creating Translation Files
Create the Directory Structure
Create .po Files (Source Translation Files)
locales/en_US/LC_MESSAGES/messages.po:
locales/id_ID/LC_MESSAGES/messages.po:
Compile .po Files to .mo Files
The library requires compiled .mo files. Use Babel's msgfmt or pybabel to compile:
You may need to prefix the command with poetry run or uv run.
Alternative: Programmatic Catalog Creation (Testing/Development)
For testing or development, you can create catalogs programmatically using Babel:
Initializing a Translation Provider
The FileSystemLocaleProvider discovers and loads translation catalogs from your directory structure.
Basic Initialization
Configuration Options
locales_dir(required): Absolute or relative path to the directory containing locale subdirectories.backend(optional): Translation backend identifier. Default:"babel".default_locale(optional): Locale to use when requested locale is unavailable. Default:"en".strict_mode(optional):False(default): Returns key or empty string for missing translationsTrue: RaisesLocaleNotFoundErrororTranslationKeyErrorfor missing resources
backend_config(optional): Dictionary of backend-specific configuration:domain: Gettext domain name (default:"messages")
Checking Available Locales
Direct Provider Usage (Without Manager)
You can use the provider directly with explicit locale parameters:
Defining Translations in Code
There are three ways to use translations in your code: direct provider usage, TranslationManager, and shorthand functions.
Method 1: Direct Provider Usage (Explicit Locale)
Best for: Applications that need explicit control over locale per operation.
Method 2: Translation Manager (Current Locale State)
Best for: Applications with a stable current locale that changes infrequently.
⚠️ Thread Safety Note: TranslationManager instances are not thread-safe. For multi-threaded applications:
Create a manager instance per request/thread, OR
Use the provider directly with explicit locale parameters, OR
Use the global context API (Method 3)
Method 3: Shorthand Functions (Global Context)
Best for: Web applications and multi-threaded environments with thread-local state.
Setting Locale Globally and in Context
Global Configuration (Application Startup)
Configure the i18n system once at application startup:
⚠️ Important: By default, configure_i18n() can only be called once. To reconfigure (e.g., in tests):
Setting Locale for Current Thread
Each thread maintains its own locale context:
Temporary Locale Context (Context Manager)
Use locale_context() to temporarily switch locales:
Multi-Threaded Usage
Each thread has independent locale context:
Web Application Example (Flask)
Best Practices
1. Use Message IDs, Not English Text
Use descriptive keys instead of full English text:
2. Keep Variable Names Consistent
Use the same variable names across all locales:
3. Provide Context for Ambiguous Words
Use msgctxt to disambiguate:
4. Handle Pluralization Properly
Always use plural-aware functions for counts:
5. Configure Once, Set Locale Per Request
In web applications, configure globally and set locale per request:
6. Use Lazy Translations for Module-Level Strings
For strings defined at module level that need runtime evaluation:
7. Test with Multiple Locales
Include locale switching in your tests:
8. Document Your Message IDs
Keep a reference document of message IDs and their purpose:
9. Handle Missing Translations Gracefully
Set up appropriate fallbacks:
10. Extract and Update Catalogs Regularly
Maintain up-to-date translation files:
Last updated