Batch Invocation
gllm-inference | Tutorial: Batch Invocation | API Reference
What is batch invocation?
Batch invocation is a feature that allows the language model to process multiple requests in a single cell. Batch invocation are generally cheaper that standard invocation, but are slower in exchange. Thus, it's suitable for large amount of requests that does not concern latency.
Batch invocation is only available for certain LM invokers. This feature can be accessed via the batch attribute of the LM invoker. As an example, let's try executing a batch invocation using the AnthropicLMInvoker:
import asyncio
from gllm_core.utils import RetryConfig
from gllm_inference.lm_invoker import AnthropicLMInvoker
lm_invoker = AnthropicLMInvoker("claude-sonnet-4-20250514", retry_config=RetryConfig(timeout=360))
requests = {
f"request_{letter}": f"Name an animal that starts with the letter '{letter}'"
for letter in "ABCDE"
}
async def main():
results = await lm_invoker.batch.invoke(requests)
print("Results:")
for result_id, result in results.items():
print(f">> {result_id}: {result.text}")
if __name__ == "__main__":
asyncio.run(main())Output:
Alternatively, the following standalone batch operations can also be executed separately:
Create a Batch Job
We can create batch job by utilizing the create() method.
Get a Batch Job Status
The status of a batch job can be checked via the status() method.
Retrieve a Batch Job Results
Once a batch job is done, the results can be retrieved with the retrieve() method.
List Batch Jobs
The list of currently available batch jobs are accessible via the list() method.
Cancel a Batch Job
If desired, a batch job can be cancelled using the cancel() method.
Last updated