Language Model (LM) Invoker

gllm-inference | Tutorial: Language Model (LM) Invoker| Use Case: Utilize Language Model Request Processor | API Reference | Cookbook

What’s an LM Invoker?

The LM invoker is a unified interface designed to help you interact with language models to generate outputs based on the provided inputs. In this tutorial, you'll learn how to invoke a language model using OpenAILMInvoker in just a few lines of code. You can also explore other types of LM Invokers, available here.

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Available LM Invokers

The LM invoker provides the following built-in implementations:

AnthropicLMInvoker
AzureOpenAILMInvoker
BedrockLMInvoker
DatasaurLMInvoker
GoogleLMInvoker
LangChainLMInvoker
LiteLLMLMInvoker
OpenAIChatCompletionsLMInvoker
OpenAILMInvoker
PortkeyLMInvoker
SeaLionLMInvoker
XAILMInvoker

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-inference

# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/"  "gllm-inference"

Quickstart

Initialization and Invoking

Let’s jump into a basic example using OpenAILMInvoker. We’ll ask the model a simple question and print the output.

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
output = asyncio.run(lm_invoker.invoke("What is the capital city of Indonesia?"))
print(f"output: {output.text}")

Output:

output: Jakarta.

Streaming

To achieve streaming, simply pass an event emitter when invoking the LM invoker. This allows us to process the generated tokens without having to wait for the entire generation process to finish.

import asyncio
from gllm_core.event import EventEmitter
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

event_emitter = EventEmitter.with_print_handler()
lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
output = asyncio.run(lm_invoker.invoke("What is the capital city of Indonesia?"))
print(f"output: {output}")

In the example above, we're using the PrintEventHandler which is intended to print the streamed tokens in a beautified format. For more details about the event emitter, please refer to the event emitter tutorial page.

Message Roles

Modern LMs understand context better when you structure inputs like a real conversation. That’s where message roles come in. You can simulate multi-turn chats, set instructions, or give memory to the model through a structured message format.

Example 1: Passing a system message

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_inference.schema import Message

messages = [
    Message.system("Talk like a pirate."),
    Message.user("Hi, there! How are you doing?")
]

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
output = asyncio.run(lm_invoker.invoke(messages))
print(f"output: {output.text}")

Output:

output: Ahoy, matey! I be doin' well, savvy? How be ye farin' on this fine day?

Example 2: Simulating a multi-turn conversation

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_inference.schema import Message

messages = [
    Message.system("You are a helpful assistant."),
    Message.user("What is the capital of France?"),
    Message.assistant("The capital of France is Paris."),
    Message.user("What about Indonesia?"),
]

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
output = asyncio.run(lm_invoker.invoke(messages))
print(f"output: {output.text}")

Output:

output: The capital of Indonesia is Jakarta.

Multimodal Input

Our LM Invokers supports attachments (images, documents, etc.). This lets you send rich content and ask the model to analyze or describe them.

Loading Attachments

An Attachment is a content object that can be loaded in the following ways:

using a remote file (URL)
using a local file
using a data URL
using raw bytes

We can load the attachment in the following ways:

from gllm_inference.schema import Attachment

# From remote URL
image = Attachment.from_url("https://example.com/image.png")

# From local file
image = Attachment.from_path("path/to/image.jpeg")

# From Data URL
image = Attachment.from_data_url("data:image/jpeg;base64,<base64_encoded_image>")

# From bytes
image = Attachment.from_bytes(b"<image_bytes>")

Example 1: Describe an image

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_inference.schema import Attachment

image = Attachment.from_path("path/to/dog.jpeg")

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
output = asyncio.run(lm_invoker.invoke(["What is this?", image]))
print(f"output: {output.text}")

Output:

output: A cute golden retriever puppy.

Example 2: Analyze a PDF

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_inference.schema import Attachment

URL = "https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"
document = Attachment.from_url(URL)

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
output = asyncio.run(lm_invoker.invoke(["What is the title of this file?", document]))
print(f"output: {output.text}")

Output:

output: This is the scientific paper titled "Attention Is All You Need".

Supported Attachment Types

Each LM might support different types of inputs. You can find more about supported type for each LM Invoker here.

Structured Output

In many real-world applications, we don't just want natural language outputs — we want structured data that our programs can parse and use directly.

You can define your expected output using:

A Pydantic BaseModel class (recommended).
A JSON schema dictionary compatible with Pydantic's schema format.

When structured output is enabled, structured output results are stored in the outputs attribute of the LMOutput object and can be accessed via the structured_outputs property. The output type depends on the input schema:

Pydantic instance → The output will be a Pydantic BaseModel instance.
JSON schema dict → The output will be a Python dictionary.

Using a Pydantic BaseModel (Recommended)

You can define your expected output format as a Pydantic class. This ensures strong type safety and makes the output easier to work with in Python.

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from pydantic import BaseModel

class Animal(BaseModel):
    name: str
    size: str
    diet: str
    color: list[str]
    legs: int

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, response_schema=Animal)
output = asyncio.run(lm_invoker.invoke("Describe a chicken!"))
print(f"output: {output.structured_output}")

Output:

output: Animal(
    name='Chicken',
    size='Medium',
    diet='Omnivore',
    color=['varies by breed'],
    legs=2,
)

Using a JSON Schema Dictionary

Alternatively, you can define the structure using a JSON schema dictionary.

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

animal_schema = {
    "properties": {
        "color": {
            "items": {"type": "string"},
            "title": "Color",
            "type": "array"
        },
        "diet": {"title": "Diet", "type": "string"},
        "legs": {"title": "Legs", "type": "integer"},
        "name": {"title": "Name", "type": "string"},
        "size": {"title": "Size", "type": "string"}
    },
    "required": ["name", "size", "diet", "color", "legs"],
    "title": "Animal",
    "type": "object",
}

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, response_schema=animal_schema)
output = asyncio.run(lm_invoker.invoke("Describe a chicken!"))
print(f"output: {output.structured_output}")

Output

output: {
    'color': ['reddish-brown feathers', 'white underparts'],
    'diet': 'Omnivore; seeds, grains, insects, greens',
    'legs': 2,
    'name': 'Chicken',
    'size': 'Medium',
}

If JSON schema is used, it must still be compatible with Pydantic's JSON schema, especially for complex schemas. For this reason, it is recommended to create the JSON schema using Pydantic's model_json_schema method.

animal_schema = Animal.model_json_schema()

Tool Calling

Tool calling means letting a language model call external functions to help it solve a task. It allows the AI to interact with external functions and APIs during the conversation, enabling dynamic computation, data retrieval, and complex workflows.

Think of it as:

The LM is smart at reading and reasoning, but when it needs to calculate or get external data, it picks up the phone and calls your "tool".

For more information about tools definition, please refer to this guide.

LM Invocation with Tool

Let's try to integrate a simple math operation tool to our LM invoker!

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_core.schema.tool import tool

@tool
def get_weather(city: str) -> str:
    """Get the weather of a city.

    Args:
        city (str): The city to get the weather of.

    Returns:
        str: The weather of the city.
    """
    return f"The weather of {city} is sunny."

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, tools=[get_weather])
output = asyncio.run(lm_invoker.invoke("What is the weather of Jakarta?"))
print(f"output: {output.tool_calls}")

Output:

output: [
    ToolCall(
        id='call_BjoVYmFI1CkkSNTzP3LtkGRK',
        name='get_weather',
        args={'city': 'Jakarta'},
        data=None,
    )
]

When the LM Invoker is invoked with tool calling capability, the model will return the tool calls. In this case, we still need to execute the tools and feed the result back to the LM invoker ourselves. If you'd like to handle this looping process automatically, please refere to the LM Request Processor component.

Native Tools

Native tools are a specific set tools that allow the language model to execute certain built-in capabilities during the invocation, enabling dynamic computation, data retrieval, and complex workflows. Similar to the user-defined tools, the native tools can be enabled by passing them through the LM invoker's tools parameter.

Each type of native tools is only available for certain LM invokers. Please find the available native tools below:

Code interpreter — Writes and runs Python code in a sandboxed environment.
Image generation — Generates an image based on the provided query.
MCP Server — Uses remote MCP servers to give models new capabilities.
MCP Connector — Retrieves data from remote MCP connectors.
Skill — Manages custom skills on the provider's server side.
Web search — Searches the web for relevant information.

Thinking

Certain language model providers and models supports thinking. Thinking allows models to produce an internal chain of thought before responding to the user. This enables model to perform advanced tasks such as complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows.

When thinking is enabled, thinking results are stored in the outputs attribute of the LMOutput object and can be accessed via the thinkings property.

Let's try to perform thinking by using OpenAI's gpt-5-nano model:

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, thinking=True)

query = "Solve x^2 + 2x + 1 = 0"
output = asyncio.run(lm_invoker.invoke(query))
for item in output.outputs:
    print(f"=== Output item: {item.type!r} ===\n{item.output}\n")

Output:

=== Output item: 'thinking' ===
id='rs_010dc18bedf628330069688c3967a88194abd178f92ef8acaa'
reasoning="""
**Solving quadratic equations**\n\nThe user asks me to solve the equation x^2 + 2x + 1 = 0.
This factors to (x+1)^2 = 0, so the double root is x = -1. I can concisely show the steps.
Additionally, I should note the discriminant calculation, which is b² - 4ac = 0.
If the user wants a step-by-step breakdown, I can explain that recognizing it's a perfect square
helps solve it. Ultimately, the final answer is x = -1 (double root).
"""

=== Output item: 'text' ===
x^2 + 2x + 1 = 0 factors as (x + 1)^2 = 0.
So x = -1 (a double root).

For more fine-grained control, you can use ThinkingConfig to pass provider-specific thinking parameters. Provider-specific parameters can be found in the provider's documentation.

Pass the provider specific parameter in kwargs

from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.schema import ThinkingConfig

thinking_config = ThinkingConfig(enabled=True, kwargs={"effort": "high"})
lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, thinking=thinking_config)

Output Analytics

Output analytics enables you to collect detailed metrics and insights about your language model invocations. When output analytics is enabled, the output includes the following extra attributes:

token_usage : Input and output token counts.
duration : Time taken to generate the output.
finish_details: Additional details about how the generation finished.

To enable output analytics, simply need to set the output_analytics parameter to True.

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, output_analytics=True)
output = asyncio.run(lm_invoker.invoke("What is the capital of France?"))
print(f"output: {output}")

Output:

output: LMOutput(
    outputs=[LMOutputItem(type="text", output="Paris.")],
    token_usage=input_tokens=13 output_tokens=8,
    duration=2.211723566055298,
    finish_details={'status': 'completed', 'incomplete_details': {'reason': None}}
)

Retry & Timeout

Retry & timeout functionality provides robust error handling and reliability for language model interactions. It allows you to automatically retry failed requests and set time limits for operations, ensuring your applications remain responsive and resilient to network issues or API failures.

Retry & timeout can be configured via the RetryConfig class' parameters:

max_retries: Maximum number of retry attempts (defaults to 3 maximum retry attempts).
timeout: Maximum time in seconds to wait for each request (defaults to 30.0 seconds). To disable timeout, this parameter can be set to None.

Let's try to apply it to our LM invoker!

import asyncio
from gllm_core.utils.retry import RetryConfig
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

retry_config = RetryConfig(max_retries=3, timeout=100)
lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, retry_config=retry_config)
output = asyncio.run(lm_invoker.invoke("What is the capital of France?"))
print(f"output: {output}")

Extra Capabilities

Some LM invokers also provide additional capabities that are useful in certain cases:

Input Transformer — To transform the language model messages input before invocation.
Output Transformer — To transform the raw output from the language model into a different format or structure.
Batch Invocation — To manage batch requests for cheaper but slower invocations.
File Management — To manage uploaded files in their server side. These files can then be used as inputs during invocations.
Data Store Management — To manage built-in data stores to be used as internal knowledge base. This allows the LM invoker to perform built-in RAG (Retrieval-Augmented Generation).

If you encounter errors, refer to the Troubleshooting Guide for detailed explanations of common errors and how to resolve them.

PreviousInference NextCode Interpreter

Last updated 6 hours ago

Was this helpful?

hashtagWhat’s an LM Invoker?

hashtagAvailable LM Invokers

hashtagInstallation

hashtagQuickstart

hashtagStreaming

hashtagMessage Roles

hashtagMultimodal Input

hashtagStructured Output

hashtagTool Calling

hashtagNative Tools

hashtagThinking

hashtagOutput Analytics

hashtagRetry & Timeout

hashtagExtra Capabilities

What’s an LM Invoker?

Available LM Invokers

Installation

Quickstart

Streaming

Message Roles

Multimodal Input

Structured Output

Tool Calling

Native Tools

Thinking

Output Analytics

Retry & Timeout

Extra Capabilities