# Language Model (LM) Invoker

[**`gllm-inference`**](https://github.com/GDP-ADMIN/gl-sdk/tree/main/libs/gllm-inference/gllm_inference/catalog)  | **Tutorial**:  [lm-invoker](https://gdplabs.gitbook.io/sdk/~/revisions/beykCxz0UanaEX0sPJJu/tutorials/inference/lm-invoker "mention")| **Use Case:** [utilize-language-model-request-processor](https://gdplabs.gitbook.io/sdk/~/revisions/beykCxz0UanaEX0sPJJu/how-to-guides/utilize-language-model-request-processor "mention") | [API Reference](https://api.python.docs.gdplabs.id/gen-ai/library/gllm_inference/api/lm_invoker.html)

### What’s an LM Invoker? <a href="#whats-an-em-invoker" id="whats-an-em-invoker"></a>

The **LM invoker** is a unified interface designed to help you interact with language models to generate outputs based on the provided inputs. In this tutorial, you'll learn how to invoke a language model using `OpenAILMInvoker` in **just a few lines of code**. You can also explore other types of LM Invokers, available [here](https://api.python.docs.glair.ai/generative-internal/library/gllm_inference/api/lm_invoker.html).

<details>

<summary>Prerequisites</summary>

This example specifically requires completion of all setup steps listed on the [prerequisites](https://gdplabs.gitbook.io/sdk/~/revisions/beykCxz0UanaEX0sPJJu/gen-ai-sdk/prerequisites "mention") page.

</details>

## Installation

{% tabs %}
{% tab title="Linux, macOS, or Windows WSL" %}

```bash
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-inference
```

{% endtab %}

{% tab title="Windows Powershell" %}

```powershell
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-inference
```

{% endtab %}

{% tab title="Windows Command Prompt" %}

```bash
# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/"  "gllm-inference"
```

{% endtab %}
{% endtabs %}

## Quickstart

### Initialization and Invoking

Let’s jump into a basic example using `OpenAILMInvoker`. We’ll ask the model a simple question and print the response.

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
response = asyncio.run(lm_invoker.invoke("What is the capital city of Indonesia?"))
print(f"Response: {response}")
```

**Output:**

```
Response: Jakarta.
```

### Understanding LM Invoker Output Type

{% hint style="info" %}
Starting v0.6.0, the output of LM Invoker will be `LMOutput` object only.
{% endhint %}

Depending on how you configure the LM Invoker, the result of `invoke(...)` may be either a **plain string** or an **`LMOutput` object**:

1. **String** → returned when you don’t request any extra features.\
   Example:

   ```python
   response = asyncio.run(lm_invoker.invoke("What is the capital of Indonesia?"))
   print(response)  # "Jakarta."
   ```
2. **`LMOutput`** → returned whenever a response contains more than just plain text, such as when features like **Structured Output**, **Tool Calling**, **Reasoning**, **Citations (Web Search)**, **Output Analytics**, or **Code Execution** are used. In these cases, you can still access the generated string through `response.response`, while also taking advantage of the additional attributes exposed by the enabled feature

Below is the key attributes of `LMOutput` based on which features of LM Output is used:

| Feature                                            | Attribute           | Type                        |
| -------------------------------------------------- | ------------------- | --------------------------- |
| All                                                | `response`          | `str`                       |
| [#tool-calling](#tool-calling "mention")           | `tool_calls`        | `list[ToolCall]`            |
| [#structured-output](#structured-output "mention") | `structured_output` | `dict \| BaseModel \| None` |
| [#output-analytics](#output-analytics "mention")   | `token_usage`       | `TokenUsage \| None`        |
| [#output-analytics](#output-analytics "mention")   | `duration`          | `float \| None`             |
| [#output-analytics](#output-analytics "mention")   | `finish_details`    | `dict`                      |
| [#reasoning](#reasoning "mention")                 | `reasoning`         | `list[Reasoning]`           |
| [#web-search](#web-search "mention")               | `citations`         | `list[Chunk]`               |
| [#code-interpreter](#code-interpreter "mention")   | `code_exec_results` | `list[CodeExecResult]`      |

***

## Message Roles

Modern LMs understand context better when you structure inputs like a real conversation. That’s where **message roles** come in. You can simulate multi-turn chats, set instructions, or give memory to the model through a structured message format.

### Example 1: Passing a system message

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_inference.schema import Message

messages = [
    Message.system("Talk like a pirate."),
    Message.user("Hi, there! How are you doing?")
]

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
response = asyncio.run(lm_invoker.invoke(messages))
print(f"Response: {response}")
```

**Output:**

<pre><code><strong>Response: Ahoy, matey! I be doin' well, savvy? How be ye farin' on this fine day?
</strong></code></pre>

### Example 2: Simulating a multi-turn conversation

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_inference.schema import Message

messages = [
    Message.system("You are a helpful assistant."),
    Message.user("What is the capital of France?"),
    Message.assistant("The capital of France is Paris."),
    Message.user("What about Indonesia?"),
]

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
response = asyncio.run(lm_invoker.invoke(messages))
print(f"Response: {response}")

```

**Output:**

```
Response: The capital of Indonesia is Jakarta.
```

***

## Multimodal Input

Our LM Invokers supports **attachments** (images, documents, etc.). This lets you send rich content and ask the model to analyze or describe them.

### Loading Attachments

An `Attachment` is a content object that can be loaded in the following ways:

* using a **remote file** (URL)
* using a **local file**
* using a **data URL**
* using raw **bytes**

We can load the attachment in the following ways:

```python
from gllm_inference.schema import Attachment

# From remote URL
image = Attachment.from_url("https://example.com/image.png")

# From local file
image = Attachment.from_path("path/to/image.jpeg")

# From Data URL
image = Attachment.from_data_url("data:image/jpeg;base64,<base64_encoded_image>")

# From bytes
image = Attachment.from_bytes(b"<image_bytes>")
```

### Example 1: Describe an image

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_inference.schema import Attachment

image = Attachment.from_path("path/to/dog.jpeg")

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
response = asyncio.run(lm_invoker.invoke(["What is this?", image]))
print(f"Response: {response}")
```

**Output:**

```
Response: A cute golden retriever puppy.
```

***

### Example 2: Analyze a PDF

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_inference.schema import Attachment

URL = "https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"
document = Attachment.from_url(URL)

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO)
response = asyncio.run(lm_invoker.invoke(["What is the title of this file?", document]))
print(f"Response: {response}")
```

**Output:**

```
Response: This is the scientific paper titled "Attention Is All You Need".
```

***

### Supported Attachment Types

Each LM might support different types of inputs. As of now, `OpenAILMInvoker` supports image and documents. You can find more about supported type for each LM Invoker [here](https://api.python.docs.glair.ai/generative-internal/library/gllm_inference/api/lm_invoker.html).

***

## Structured Output

In many real-world applications, we don't just want natural language responses — we want **structured data** that our programs can parse and use directly.

You can define your expected output using:

1. A **Pydantic `BaseModel` class** (recommended).
2. A **JSON schema dictionary** compatible with Pydantic's schema format.

When structured output is enabled, the output will be stored in the `structured_output` attribute of the response. The output type depends on the input schema:

1. **Pydantic instance** → The output will be a Pydantic BaseModel instance.
2. **JSON schema dict** → The output will be a Python dictionary.

### Using a Pydantic **`BaseModel`** (Recommended)

You can define your expected response format as a Pydantic class. This ensures strong type safety and makes the output easier to work with in Python.

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from pydantic import BaseModel

class Animal(BaseModel):
    name: str
    size: str
    diet: str
    color: list[str]
    legs: int

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, response_schema=Animal)
response = asyncio.run(lm_invoker.invoke("Describe a chicken!"))
print(f"Response: {response}")

```

**Output:**

```
Response: LMOutput(
    structured_output=Animal(
        name='Chicken',
        size='Medium', 
        diet='Omnivore', 
        color=['varies by breed'], 
        legs=2,
    )
)
```

### Using a JSON Schema Dictionary

Alternatively, you can define the structure using a JSON schema dictionary.&#x20;

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

animal_schema = {
    "properties": {
        "color": {
            "items": {"type": "string"},
            "title": "Color",
            "type": "array"
        },
        "diet": {"title": "Diet", "type": "string"},
        "legs": {"title": "Legs", "type": "integer"},
        "name": {"title": "Name", "type": "string"},
        "size": {"title": "Size", "type": "string"}
    },
    "required": ["name", "size", "diet", "color", "legs"],
    "title": "Animal",
    "type": "object",
}

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, response_schema=animal_schema)
response = asyncio.run(lm_invoker.invoke("Describe a chicken!"))
print(f"Response: {response}")
```

**Output**

```
Response: LMOutput(
    structured_output={
        'color': ['reddish-brown feathers', 'white underparts'], 
        'diet': 'Omnivore; seeds, grains, insects, greens', 
        'legs': 2, 
        'name': 'Chicken', 
        'size': 'Medium',
    }
)
```

If JSON schema is used, it must still be compatible with Pydantic's JSON schema, especially for complex schemas. For this reason, it is recommended to create the JSON schema using Pydantic's `model_json_schema` method.

```python
animal_schema = Animal.model_json_schema()
```

***

## Tool Calling

**Tool calling** means letting a language model **call external functions** to help it solve a task. It allows the AI to **interact with external functions and APIs** during the conversation, enabling dynamic computation, data retrieval, and complex workflows.

Think of it as:

> The LM is smart at reading and reasoning, but when it needs to calculate or get external data, it picks up the phone and calls your "tool".

### Tool Definition

We can easily convert a function into tools using the `tool` decorator.

```python
from gllm_core.schema.tool import tool

@tool
def add(a: int, b: int) -> int:
    """Add two numbers.
    
    Args:
        a (int): The first number.
        b (int): The second number.
        
    Returns:
        int: The added number.
    """
    return a + b
```

### LM Invocation with Tool

Let's try to integrate the above tool to our LM invoker!

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_core.schema.tool import tool

@tool
def add(a: int, b: int) -> int:
    """Add two numbers.
    
    Args:
        a (int): The first number.
        b (int): The second number.
        
    Returns:
        int: The added number.
    """
    return a + b

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, tools=[add])
response = asyncio.run(lm_invoker.invoke("What is 25 + 15?"))
print(f"Response: {response}")
```

**Output:**

```
Response: LMOutput(
    tool_calls=[ToolCall(id='call_123', name='add', args={'a': 25, 'b': 15})]
)
```

When the LM Invoker is invoked with tool calling capability, the model will return the tool calls. In this case, we still need to execute the tools and feed the result back to the LM invoker ourselves. If you'd like to handle this looping process automatically, please refere to the [LM Request Processor](https://gdplabs.gitbook.io/sdk/~/revisions/beykCxz0UanaEX0sPJJu/tutorials/inference/lm-request-processor) component.

***

## Output Analytics

**Output analytics** enables you to collect detailed metrics and insights about your language model invocations. When output analytics is enabled, the response includes the following extra attributes:

1. **`token_usage`** : Input and output token counts.
2. **`duration`** : Time taken to generate the response.
3. **`finish_details`**: Additional details about how the generation finished.

To enable output analytics, simply need to set the `output_analytics` parameter to `True`.

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, output_analytics=True)
response = asyncio.run(lm_invoker.invoke("What is the capital of France?"))
print(f"Response: {response}")
```

**Output:**

```
Response: LMOutput(
    response=Paris.,
    token_usage=input_tokens=13 output_tokens=8,
    duration=2.211723566055298,
    finish_details={'status': 'completed', 'incomplete_details': {'reason': None}}
)
```

***

## Retry & Timeout

**Retry & timeout** functionality provides **robust error handling and reliability** for language model interactions. It allows you to **automatically retry failed requests** and **set time limits** for operations, ensuring your applications remain responsive and resilient to network issues or API failures.

Retry & timeout can be configured via the `RetryConfig` class' parameters:

1. **max\_retries**: Maximum number of retry attempts (defaults to 3 maximum retry attempts).
2. **timeout**: Maximum time in seconds to wait for each request (defaults to 30.0 seconds). To disable timeout, this parameter can be set to 0.0 second.

You can also configure other parameters available [here](https://github.com/GDP-ADMIN/gl-sdk/blob/main/libs/gllm-core/gllm_core/utils/retry.py#L28-L44). Now let's try to apply it to our LM invoker!

```python
import asyncio
from gllm_core.utils.retry import RetryConfig
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

retry_config = RetryConfig(max_retries=3, timeout=100)
lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, retry_config=retry_config)
response = asyncio.run(lm_invoker.invoke("What is the capital of France?"))
print(f"Response: {response}")
```

***

## Reasoning

Certain language model providers and models supports reasoning. When reasoning is used, models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning allows model to perform advanced tasks such as complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows.

However, it's important to note that the way reasoning works varies between providers. Therefore, it is important to check how each class handled reasoning before using them.&#x20;

In the case of `OpenAILMInvoker`, reasoning can be done by using reasoning models (Often referred as the **o-series** models). Then, we can set the `reasoning_summary` parameter to output the reasoning tokens.&#x20;

The reasoning tokens will be stored in the `reasoning` attribute of the output.

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

lm_invoker = OpenAILMInvoker(OpenAILM.O4_MINI, reasoning_summary="detailed")
response = asyncio.run(lm_invoker.invoke("Solve x^2 + 2x + 1 = 0"))

for item in response.reasoning:
    print(f"=== Reasoning ===\n{item.reasoning}\n")

print(f"=== Response ===\n{response.response}\n")
```

**Output:**

```
=== Reasoning ===
**Solving the quadratic equation**

The user wants to solve the equation x² + 2x + 1 = 0. 
I see that this can be factored as (x + 1)² = 0, which gives a double root at x = -1. 
So, the solution is x = -1, with multiplicity two. 
I also checked the discriminant, which confirms this: b² - 4ac = 4 - 4 = 0. 
It might be useful to show the steps, including factorization and mention that it's a perfect square with a repeated root.

=== Response ===
The equation factors as  
x² + 2x + 1 = (x + 1)² = 0  

Hence the only solution (a double root) is  
x = –1.
```

## Web Search&#x20;

Web search is a built-in tool that allows the language model to search the web for relevant information. Currently, **this is only supported by the `OpenAILMInvoker`**. This feature can be enabled by setting the `web_search` parameter to `True`.

When it's enabled, the output will include some citations to the sources retrieved to generate the response. These are stored in the `citations` attribute.

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, web_search=True)
response = asyncio.run(lm_invoker.invoke("What was the score of Real Madrid's last match?"))

for item in response.citations:
    print(f"=== Citation ===\n{item.metadata}\n")

print(f"=== Response ===\n{response.response}\n")
```

**Output:**

```
=== Citation ===
{
    'type': 'url_citation', 
    'url': 'https://www.reuters.com/sports/soccer/new-arrivals-handled-pressure-well-laliga-opener-says-real-madrids-alonso-2025-08-20/',
    'title': "New arrivals handled the pressure well in LaLiga opener, says Real Madrid's Alonso | Reuters", 
    'start_index': 192, 
    'end_index': 332, 
}

=== Response ===
Real Madrid's last match (as of Aug 20, 2025) finished Real Madrid 1-0 Osasuna in the La Liga 2025-26 opener on August 19, 2025. 
The goal came from a Kylian Mbappé penalty in the 51st minute. 
([reuters.com](https://www.reuters.com/sports/soccer/new-arrivals-handled-pressure-well-laliga-opener-says-real-madrids-alonso-2025-08-20/))

If you’d like, I can pull up a full match report or check for any updates from later today.
```

## Code Interpreter

Code interpreter is a feature that allows the language model to write and run Python code in a sandboxed environment to solve complex problems in domains like data analysis, coding, and math. Currently, **this is only supported by the `OpenAILMInvoker`**. This feature can be enabled by setting the web\_search parameter to True.&#x20;

When it's enabled, the output will store the code execution results in the `code_exec_results` attribute.

Since OpenAI models internally recognize the code interpreter as the `Python tool`, it's recommended to explicitly instruct the model to use the `Python tool` when using code interpreter to ensure more reliable code execution. Let's try it to solve a simple math problem!

```python
import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM

lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_NANO, code_interpreter=True)
response = asyncio.run(lm_invoker.invoke("Use the Python tool to calculate 4.77 * 7.44"))

for item in response.code_exec_results:
    print(f"=== Code Execution Result ===\n{item}\n")

print(f"=== Response ===\n{response.response}\n")
```

**Output:**

```
=== Code Execution Result ===
CodeExecResult(
    id='ci_68a59ef2eee081958b15409834488d310c99b82ee9b2c6a9', 
    code='print(4.77*7.44)', 
    output=['35.4888\n'],
)

=== Response ===
35.4888
```

What's awesome about code intepreter is that it can produce **more than just a text**! In the example below, let's try creating a histogram using the code intrepreter. We're going to save any generated attachment to our local path.

```python
import asyncio
from gllm_core.utils.retry import RetryConfig
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.model import OpenAILM
from gllm_inference.schema import Attachment

query = "Use the Python tool generate a histogram for the following data: [1, 1, 3, 2, 4, 1, 2]. Make it light blue."
retry_config = RetryConfig(timeout=60)
lm_invoker = OpenAILMInvoker(OpenAILM.GPT_5_MINI, code_interpreter=True, retry_config=retry_config)
response = asyncio.run(lm_invoker.invoke(query))

for item in response.code_exec_results:
    print(f"=== Code Execution Result ===\n{item}\n")
    
    for output in item.output:
        if isinstance(output, Attachment):
            output.write_to_file("path/to/output.png")

print(f"=== Response ===\n{response.response}\n")
```

**Output:**

```
=== Code Execution Result ===
CodeExecResult(
    id='ci_68a5a079296c8195a47fc7027ff7850906af19557003f1ce', 
    code="""
        # Generating and saving a histogram for the provided data
        import matplotlib.pyplot as plt
        
        data = [1, 1, 3, 2, 4, 1, 2]

        plt.figure(figsize=(6, 4))
        # Use bins centered on integer values 1-4
        plt.hist(data, bins=[0.5, 1.5, 2.5, 3.5, 4.5], color='lightblue', edgecolor='black')
        plt.xticks([1, 2, 3, 4])\r\nplt.xlabel('Value')
        plt.ylabel('Frequency')
        plt.title('Histogram of Given Data')
        plt.tight_layout()
        
        # Save the figure
        output_path = '/mnt/data/histogram.png'
        plt.savefig(output_path, dpi=150)
        plt.show()
        
        output_path"
    """, 
    output=[
        Attachment(
            filename='f99f718d-30a5-493f-b4c8-97ac35cab552.png',
            extension='png',
            mime_type='image/png',
            url=None,
            data='89504e470d0a1a0a0000...'
        ), 
        "'/mnt/data/histogram.png'",
    ]
)

=== Response ===
I've created the histogram and saved it.
```

Below is the generated histogram that has been saved in our local path. What an awesome way to use a language model!

<figure><img src="https://2275014547-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fl9KB7mDeg58CszyWm4TI%2Fuploads%2FtwRkPCKY4rvVNYhKMhuu%2Fimage.png?alt=media&#x26;token=5f4128c4-79ea-4b3d-ab3e-912c03d815d8" alt=""><figcaption></figcaption></figure>

## MCP Server Integration

{% hint style="warning" %}
Coming Soon!
{% endhint %}

## Batch Processing

Batch processing is a feature that allows the language model to process multiple requests in a single cell. Batch processing are generally **cheaper** that standard processing, but are **slower** in exchange. Thus, it's suitable for **large amount of requests that does not concern latency**.

Currently, batch processing is only available for certain LM invokers. This feature can be accessed via the `batch` attribute of the LM invoker.  As an example, let's try executing a batch processing request using the `AnthropicLMInvoker`:

```python
import asyncio
from gllm_core.utils import RetryConfig
from gllm_inference.lm_invoker import AnthropicLMInvoker

lm_invoker = AnthropicLMInvoker("claude-sonnet-4-20250514", retry_config=RetryConfig(timeout=360))

requests = {
    f"request_{letter}": f"Name an animal that starts with the letter '{letter}'"
    for letter in "ABCDE"
}

async def main():
    results = await lm_invoker.batch.invoke(requests)

    print("Results:")
    for result_id, result in results.items():
        print(f">> {result_id}: {result.text}")

if __name__ == "__main__":
    asyncio.run(main())
```

**Output:**

```
Results:
>> request_A: Alligator.
>> request_B: Bear.
>> request_C: Cat.
>> request_D: Dog.
>> request_E: Elephant.
```

Alternatively, the following standalone batch processing operations can also be executed separately:

### Create a Batch Job

```python
requests = {
    "request_1": "What color is the sky?", 
    "request_2": "What color is the grass?",
}
batch_id = await lm_invoker.batch.create(requests)
```

### Get a Batch Job Status

```python
status = await lm_invoker.batch.status(batch_id)
```

### Retrieve a Batch Job Results

```python
results = await lm_invoker.batch.retrieve(batch_id)
```

### List Batch Jobs

```python
batch_jobs = await lm_invoker.batch.list()
```

### Cancel a Batch Job

```python
await lm_invoker.batch.cancel(batch_id)
```
