Standard Image Modality Transformer
Introduction
The Standard Image Modality Transformer utilizes a router to direct images to the most appropriate converter based on their specific type. For instance, diagram images are routed to an Image-to-Mermaid converter, which yields more accurate results than a standard Image-to-Caption converter.
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-multimodal" # you can use a Conda environment
$token = (gcloud auth print-access-token)
pip install --extra-index-url "https://oauth2accesstoken:$token@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-multimodal"# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "gllm-multimodal"Quickstart
Initialize the Standard Image Modality Transformer through the built-in preset, which saves you the hassle of initializing all the required components.
import asyncio
from gllm_inference.schema import Attachment
from gllm_multimodal.modality_transformer.image_modality_transformer.standard_image_modality_transformer import StandardImageModalityTransformer
image = Attachment.from_path("./flowchart.jpg")
transformer = StandardImageModalityTransformer.from_preset("domain_specific")
result = asyncio.run(transformer.transform(image.data))
print(result)Output:
Custom Router
For more customizable component, image transformer can be initialized with a custom router and route_mapping . The route_mapping is a dictionary defining which converters should handle each route.
Skip Routing
You can also skip the routing process. It will automatically route the image to the router's default route.
Last updated