Generic Image Modality Transformer

Introduction

Generic Image Modality Transformer is a version of Image Modality Transformer that only use one converter and doesn't use any router.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-multimodal" 

Quickstart

Initialize the Generic Image Modality Transformer by passing the modality converter into it.

file-image
103KB
import asyncio

from gllm_inference.schema import Attachment
from gllm_multimodal.modality_converter.image_to_text.image_to_caption import LMBasedImageToCaption
from gllm_multimodal.modality_transformer.image_modality_transformer.generic_image_modality_transformer import GenericImageModalityTransformer

image = Attachment.from_path("./school_backpack.jpg")

converter = LMBasedImageToCaption.from_preset("default")
transformer = GenericImageModalityTransformer(converter)
result = asyncio.run(transformer.transform(image.data, skip_routing=True))
print(result)

Output:

Last updated