Multimodal Input
What if your prompt includes more than just text? Good news, our LM Invokers supports attachments like images and documents! This lets you send rich content and ask the model to analyze or describe them.
What’s an Attachment?
An Attachment
is a content object that can be:
A remote file (URL)
A local file
A data URL
Raw bytes
Load Attachments
from gllm_inference.schema import Attachment
# From remote URL
image = Attachment.from_url("https://example.com/image.png")
# From local file
image = Attachment.from_path("path/to/image.jpeg")
# From Data URL
image = Attachment.from_data_url("data:image/jpeg;base64,<base64_encoded_image>")
# From bytes
image = Attachment.from_bytes(b"<image_bytes>")
Example 1: Describe an image
from gllm_inference.schema import Attachment, PromptRole
image = Attachment.from_path("path/to/dog.jpeg")
prompt = [
(PromptRole.SYSTEM, ["Reply with concise answers!"]),
(PromptRole.USER, ["What is this?", image]),
]
lm_invoker = OpenAILMInvoker("gpt-4.1-nano")
response = asyncio.run(lm_invoker.invoke(prompt))
print(f"Response: {response}")
Output:
Response: A cute golden retriever puppy.
Example 2: Analyze a PDF
document = Attachment.from_url(
"https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"
)
prompt = [
(PromptRole.SYSTEM, ["Reply with concise answers!"]),
(PromptRole.USER, ["What is this?", document]),
]
lm_invoker = OpenAILMInvoker("gpt-4.1-nano")
response = asyncio.run(lm_invoker.invoke(prompt))
print(f"Response: {response}")
Output:
Response: This is the scientific paper titled "Attention Is All You Need".
Supported Attachment Types
Each LM might support different types of inputs. As of now, OpenAILMInvoker
supports:
Documents (PDF, DOCX, etc.)
Images (JPEG, PNG, etc.)
You can find more about supported type for each LM Invoker here.
Last updated