Batch Invocation

gllm-inference | Tutorial: Batch Invocation | API Reference

What is batch invocation?

Batch invocation is a feature that allows the language model to process multiple requests in a single cell. Batch invocation are generally cheaper that standard invocation, but are slower in exchange. Thus, it's suitable for large amount of requests that does not concern latency.

Batch invocation is only available for certain LM invokers. This feature can be accessed via the batch attribute of the LM invoker. As an example, let's try executing a batch invocation using the AnthropicLMInvoker:

import asyncio
from gllm_core.utils import RetryConfig
from gllm_inference.lm_invoker import AnthropicLMInvoker

lm_invoker = AnthropicLMInvoker("claude-sonnet-4-20250514", retry_config=RetryConfig(timeout=360))

requests = {
    f"request_{letter}": f"Name an animal that starts with the letter '{letter}'"
    for letter in "ABCDE"
}

async def main():
    results = await lm_invoker.batch.invoke(requests)

    print("Results:")
    for result_id, result in results.items():
        print(f">> {result_id}: {result.text}")

if __name__ == "__main__":
    asyncio.run(main())

Output:

Alternatively, the following standalone batch operations can also be executed separately:

Create a Batch Job

We can create batch job by utilizing the create() method.

Get a Batch Job Status

The status of a batch job can be checked via the status() method.

Retrieve a Batch Job Results

Once a batch job is done, the results can be retrieved with the retrieve() method.

List Batch Jobs

The list of currently available batch jobs are accessible via the list() method.

Cancel a Batch Job

If desired, a batch job can be cancelled using the cancel() method.

Last updated