Standard Image Modality Transformer

Introduction

The Standard Image Modality Transformer utilizes a router to direct images to the most appropriate converter based on their specific type. For instance, diagram images are routed to an Image-to-Mermaid converter, which yields more accurate results than a standard Image-to-Caption converter.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-multimodal" 

Quickstart

Initialize the Standard Image Modality Transformer through the built-in preset, which saves you the hassle of initializing all the required components.

file-image
22KB
import asyncio

from gllm_inference.schema import Attachment
from gllm_multimodal.modality_transformer.image_modality_transformer.standard_image_modality_transformer import StandardImageModalityTransformer

image = Attachment.from_path("./flowchart.jpg")
transformer = StandardImageModalityTransformer.from_preset("domain_specific")
result = asyncio.run(transformer.transform(image.data))
print(result)

Output:

Custom Router

For more customizable component, image transformer can be initialized with a custom router and route_mapping . The route_mapping is a dictionary defining which converters should handle each route.

Skip Routing

You can also skip the routing process. It will automatically route the image to the router's default route.

Last updated

Was this helpful?