languageTranslation

This guide explains translation and how to use gllm-intl to localize your applications with message catalogs and multiple locales.

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-intl

What is Translation?

Translation in the context of internationalization (i18n) is the process of adapting text, messages, and content from one language to another while preserving meaning and context. Unlike transliteration (which converts sounds between scripts), translation converts the meaning and intent of messages.

Key Concepts:

  • Message ID (msgid): A unique key identifying a translatable message (e.g., "greeting")

  • Message String (msgstr): The translated text for a specific locale (e.g., "Hello" in English, "Halo" in Indonesian)

  • Locale: A language and regional identifier (e.g., en_US for US English, id_ID for Indonesian)

  • Context: Additional information to disambiguate identical keys with different meanings

  • Pluralization: Different message forms based on quantity (e.g., "1 item" vs "2 items")

Examples:

Message ID

English (en_US)

Indonesian (id_ID)

French (fr_FR)

greeting

Hello

Halo

Bonjour

welcome_user

Welcome, {name}!

Selamat datang, {name}!

Bienvenue, {name}!

item (singular)

1 item

1 item

1 article

item (plural)

{count} items

{count} item

{count} articles

The gllm-intl library uses GNU gettext format via Babel for industry-standard translation catalog management.


What is a Translation Catalog?

A translation catalog is a structured database of translated messages organized by locale. Catalogs use the gettext format, an industry standard for software localization.

Catalog File Types:

  1. .po (Portable Object) - Human-readable source file:

  2. .mo (Machine Object) - Compiled binary file used at runtime (generated from .po)

Catalog Structure:

Catalog Features:

Feature
Description
Example

Simple Messages

Basic key-value translations

msgid "greeting"msgstr "Hello"

Variable Interpolation

Dynamic content injection

"Welcome, {name}!" with name="Alice"

Plural Forms

Quantity-aware translations

msgid_plural "items" with count rules

Contextual Messages

Disambiguate identical keys

msgctxt "file" vs msgctxt "person"

Metadata

Language, encoding, plural rules

"Plural-Forms: nplurals=2; plural=(n != 1);"


Why Use Translation Catalogs?

1. Multi-Language Support

Serve users in their preferred language:

2. Centralized Management

Separate translatable content from code:

3. Professional Translation Workflow

Enable translators to work independently:

4. Plural Form Handling

Handle language-specific pluralization rules automatically:

5. Context Disambiguation

Differentiate identical words with different meanings:


Quick Start

Translate in 3 steps:


Setting Up Translation Directory Structure

The gllm-intl library uses Babel-style gettext catalogs for storing translations. Each locale requires a specific directory structure.

Directory Structure

Creating Translation Files

Create the Directory Structure

Create .po Files (Source Translation Files)

locales/en_US/LC_MESSAGES/messages.po:

locales/id_ID/LC_MESSAGES/messages.po:

Compile .po Files to .mo Files

The library requires compiled .mo files. Use Babel's msgfmt or pybabel to compile:

circle-exclamation

Alternative: Programmatic Catalog Creation (Testing/Development)

For testing or development, you can create catalogs programmatically using Babel:


Initializing a Translation Provider

The FileSystemLocaleProvider discovers and loads translation catalogs from your directory structure.

Basic Initialization

Configuration Options

  • locales_dir (required): Absolute or relative path to the directory containing locale subdirectories.

  • backend (optional): Translation backend identifier. Default: "babel".

  • default_locale (optional): Locale to use when requested locale is unavailable. Default: "en".

  • strict_mode (optional):

    • False (default): Returns key or empty string for missing translations

    • True: Raises LocaleNotFoundError or TranslationKeyError for missing resources

  • backend_config (optional): Dictionary of backend-specific configuration:

    • domain: Gettext domain name (default: "messages")

Checking Available Locales

Direct Provider Usage (Without Manager)

You can use the provider directly with explicit locale parameters:


Defining Translations in Code

There are three ways to use translations in your code: direct provider usage, TranslationManager, and shorthand functions.

Method 1: Direct Provider Usage (Explicit Locale)

Best for: Applications that need explicit control over locale per operation.

Method 2: Translation Manager (Current Locale State)

Best for: Applications with a stable current locale that changes infrequently.

⚠️ Thread Safety Note: TranslationManager instances are not thread-safe. For multi-threaded applications:

  • Create a manager instance per request/thread, OR

  • Use the provider directly with explicit locale parameters, OR

  • Use the global context API (Method 3)

Method 3: Shorthand Functions (Global Context)

Best for: Web applications and multi-threaded environments with thread-local state.


Setting Locale Globally and in Context

Global Configuration (Application Startup)

Configure the i18n system once at application startup:

⚠️ Important: By default, configure_i18n() can only be called once. To reconfigure (e.g., in tests):

Setting Locale for Current Thread

Each thread maintains its own locale context:

Temporary Locale Context (Context Manager)

Use locale_context() to temporarily switch locales:

Multi-Threaded Usage

Each thread has independent locale context:

Web Application Example (Flask)


Best Practices

1. Use Message IDs, Not English Text

Use descriptive keys instead of full English text:

2. Keep Variable Names Consistent

Use the same variable names across all locales:

3. Provide Context for Ambiguous Words

Use msgctxt to disambiguate:

4. Handle Pluralization Properly

Always use plural-aware functions for counts:

5. Configure Once, Set Locale Per Request

In web applications, configure globally and set locale per request:

6. Use Lazy Translations for Module-Level Strings

For strings defined at module level that need runtime evaluation:

7. Test with Multiple Locales

Include locale switching in your tests:

8. Document Your Message IDs

Keep a reference document of message IDs and their purpose:

9. Handle Missing Translations Gracefully

Set up appropriate fallbacks:

10. Extract and Update Catalogs Regularly

Maintain up-to-date translation files:

Last updated