Standard Image Modality Transformer

Introduction

The Standard Image Modality Transformer utilizes a router to direct images to the most appropriate converter based on their specific type. For instance, diagram images are routed to an Image-to-Mermaid converter, which yields more accurate results than a standard Image-to-Caption converter.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-multimodal" 

Quickstart

Initialize the Standard Image Modality Transformer through the built-in preset, which saves you the hassle of initializing all the required components.

file-image
22KB
import asyncio

from gllm_inference.schema import Attachment
from gllm_multimodal.modality_transformer.image_modality_transformer.standard_image_modality_transformer import StandardImageModalityTransformer

image = Attachment.from_path("./flowchart.jpg")
transformer = StandardImageModalityTransformer.from_preset("domain_specific")
result = asyncio.run(transformer.transform(image.data))
print(result)

Output:

Custom Router

For more customizable component, image transformer can be initialized with a custom router and route_mapping . The route_mapping is a dictionary defining which converters should handle each route.

Skip Routing

You can also skip the routing process. It will automatically route the image to the router's default route.

Last updated