Language Model (LM) Invoker
gllm-inference | Tutorial: Language Model (LM) Invoker| Use Case: Utilize Language Model Request Processor | API Reference
What’s an LM Invoker?
The LM invoker is a unified interface designed to help you interact with language models to generate outputs based on the provided inputs. In this tutorial, you'll learn how to invoke a language model using OpenAILMInvoker in just a few lines of code. You can also explore other types of LM Invokers, available here.
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-inference# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-inference# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-inference"Quickstart
Initialization and Invoking
Let’s jump into a basic example using OpenAILMInvoker. We’ll ask the model a simple question and print the response.
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
response = asyncio.run(lm_invoker.invoke("What is the capital city of Indonesia?"))
print(f"Response: {response}")Output:
Understanding LM Invoker Output Type
Depending on how you configure the LM Invoker, the result of invoke(...) may be either a plain string or an LMOutput object:
String → returned when you don’t request any extra features. Example:
LMOutput→ returned whenever a response contains more than just plain text, such as when features like Structured Output, Tool Calling, Reasoning, Citations (Web Search), Output Analytics, or Code Execution are used. In these cases, you can still access the generated string throughresponse.response, while also taking advantage of the additional attributes exposed by the enabled feature
Below is the key attributes of LMOutput based on which features of LM Output is used:
All
response
str
Message Roles
Modern LMs understand context better when you structure inputs like a real conversation. That’s where message roles come in. You can simulate multi-turn chats, set instructions, or give memory to the model through a structured message format.
Example 1: Passing a system message
Output:
Example 2: Simulating a multi-turn conversation
Output:
Multimodal Input
Our LM Invokers supports attachments (images, documents, etc.). This lets you send rich content and ask the model to analyze or describe them.
Loading Attachments
An Attachment is a content object that can be loaded in the following ways:
using a remote file (URL)
using a local file
using a data URL
using raw bytes
We can load the attachment in the following ways:
Example 1: Describe an image
Output:
Example 2: Analyze a PDF
Output:
Supported Attachment Types
Each LM might support different types of inputs. As of now, OpenAILMInvoker supports image and documents. You can find more about supported type for each LM Invoker here.
Structured Output
In many real-world applications, we don't just want natural language responses — we want structured data that our programs can parse and use directly.
You can define your expected output using:
A Pydantic
BaseModelclass (recommended).A JSON schema dictionary compatible with Pydantic's schema format.
When structured output is enabled, the output will be stored in the structured_output attribute of the response. The output type depends on the input schema:
Pydantic instance → The output will be a Pydantic BaseModel instance.
JSON schema dict → The output will be a Python dictionary.
Using a Pydantic BaseModel (Recommended)
BaseModel (Recommended)You can define your expected response format as a Pydantic class. This ensures strong type safety and makes the output easier to work with in Python.
Output:
Using a JSON Schema Dictionary
Alternatively, you can define the structure using a JSON schema dictionary.
Output
If JSON schema is used, it must still be compatible with Pydantic's JSON schema, especially for complex schemas. For this reason, it is recommended to create the JSON schema using Pydantic's model_json_schema method.
Tool Calling
Tool calling means letting a language model call external functions to help it solve a task. It allows the AI to interact with external functions and APIs during the conversation, enabling dynamic computation, data retrieval, and complex workflows.
Think of it as:
The LM is smart at reading and reasoning, but when it needs to calculate or get external data, it picks up the phone and calls your "tool".
Tool Definition
We can easily convert a function into tools using the tool decorator.
LM Invocation with Tool
Let's try to integrate the above tool to our LM invoker!
Output:
When the LM Invoker is invoked with tool calling capability, the model will return the tool calls. In this case, we still need to execute the tools and feed the result back to the LM invoker ourselves. If you'd like to handle this looping process automatically, please refere to the LM Request Processor component.
Output Analytics
Output analytics enables you to collect detailed metrics and insights about your language model invocations. When output analytics is enabled, the response includes the following extra attributes:
token_usage: Input and output token counts.duration: Time taken to generate the response.finish_details: Additional details about how the generation finished.
To enable output analytics, simply need to set the output_analytics parameter to True.
Output:
Retry & Timeout
Retry & timeout functionality provides robust error handling and reliability for language model interactions. It allows you to automatically retry failed requests and set time limits for operations, ensuring your applications remain responsive and resilient to network issues or API failures.
Retry & timeout can be configured via the RetryConfig class' parameters:
max_retries: Maximum number of retry attempts (defaults to 3 maximum retry attempts).
timeout: Maximum time in seconds to wait for each request (defaults to 30.0 seconds). To disable timeout, this parameter can be set to 0.0 second.
You can also configure other parameters available here. Now let's try to apply it to our LM invoker!
Reasoning
Certain language model providers and models supports reasoning. When reasoning is used, models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning allows model to perform advanced tasks such as complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows.
However, it's important to note that the way reasoning works varies between providers. Therefore, it is important to check how each class handled reasoning before using them.
In the case of OpenAILMInvoker, reasoning can be done by using reasoning models (Often referred as the o-series models). Then, we can set the reasoning_summary parameter to output the reasoning tokens.
The reasoning tokens will be stored in the reasoning attribute of the output.
Output:
Web Search
Web search is a built-in tool that allows the language model to search the web for relevant information. Currently, this is only supported by the OpenAILMInvoker. This feature can be enabled by setting the web_search parameter to True.
When it's enabled, the output will include some citations to the sources retrieved to generate the response. These are stored in the citations attribute.
Output:
Code Interpreter
Code interpreter is a feature that allows the language model to write and run Python code in a sandboxed environment to solve complex problems in domains like data analysis, coding, and math. Currently, this is only supported by the OpenAILMInvoker. This feature can be enabled by setting the web_search parameter to True.
When it's enabled, the output will store the code execution results in the code_exec_results attribute.
Since OpenAI models internally recognize the code interpreter as the Python tool, it's recommended to explicitly instruct the model to use the Python tool when using code interpreter to ensure more reliable code execution. Let's try it to solve a simple math problem!
Output:
What's awesome about code intepreter is that it can produce more than just a text! In the example below, let's try creating a histogram using the code intrepreter. We're going to save any generated attachment to our local path.
Output:
Below is the generated histogram that has been saved in our local path. What an awesome way to use a language model!

MCP Server Integration
Coming Soon!
Batch Processing
Batch processing is a feature that allows the language model to process multiple requests in a single cell. Batch processing are generally cheaper that standard processing, but are slower in exchange. Thus, it's suitable for large amount of requests that does not concern latency.
Currently, batch processing is only available for certain LM invokers. This feature can be accessed via the batch attribute of the LM invoker. As an example, let's try executing a batch processing request using the AnthropicLMInvoker:
Output:
Alternatively, the following standalone batch processing operations can also be executed separately:
Create a Batch Job
Get a Batch Job Status
Retrieve a Batch Job Results
List Batch Jobs
Cancel a Batch Job
Last updated