Multimodal Input

What if your prompt includes more than just text? Good news, our LM Invokers supports attachments like images and documents! This lets you send rich content and ask the model to analyze or describe them.

What’s an Attachment?

An Attachment is a content object that can be:

  • A remote file (URL)

  • A local file

  • A data URL

  • Raw bytes

Load Attachments

from gllm_inference.schema import Attachment

# From remote URL
image = Attachment.from_url("https://example.com/image.png")

# From local file
image = Attachment.from_path("path/to/image.jpeg")

# From Data URL
image = Attachment.from_data_url("data:image/jpeg;base64,<base64_encoded_image>")

# From bytes
image = Attachment.from_bytes(b"<image_bytes>")

Example 1: Describe an image

from gllm_inference.schema import Attachment, PromptRole

image = Attachment.from_path("path/to/dog.jpeg")
prompt = [
    (PromptRole.SYSTEM, ["Reply with concise answers!"]),
    (PromptRole.USER, ["What is this?", image]),
]

lm_invoker = OpenAILMInvoker("gpt-4.1-nano")
response = asyncio.run(lm_invoker.invoke(prompt))
print(f"Response: {response}")

Output:

Response: A cute golden retriever puppy.

Example 2: Analyze a PDF

document = Attachment.from_url(
    "https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"
)

prompt = [
    (PromptRole.SYSTEM, ["Reply with concise answers!"]),
    (PromptRole.USER, ["What is this?", document]),
]

lm_invoker = OpenAILMInvoker("gpt-4.1-nano")
response = asyncio.run(lm_invoker.invoke(prompt))
print(f"Response: {response}")

Output:

Response: This is the scientific paper titled "Attention Is All You Need".

Supported Attachment Types

Each LM might support different types of inputs. As of now, OpenAILMInvoker supports:

  • Documents (PDF, DOCX, etc.)

  • Images (JPEG, PNG, etc.)

You can find more about supported type for each LM Invoker here.

Last updated