Image to Mermaid
Introduction
The Image to Mermaid component converts flowchart and diagram images into Mermaid syntax using multimodal LLMs. It analyzes visual structures (nodes, shapes, connectors) and generates valid Mermaid code that preserves the diagram's logic and relationships.
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-multimodal" # you can use a Conda environment
$token = (gcloud auth print-access-token)
pip install --extra-index-url "https://oauth2accesstoken:$token@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-multimodal"# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "gllm-multimodal"Quickstart
The simplest way to initialize Image to Mermaid component is to use the built-in preset.
import asyncio
from gllm_inference.schema import Attachment
from gllm_multimodal.modality_converter.image_to_text.image_to_mermaid import LMBasedImageToMermaid
image = Attachment.from_path("./flowchart.jpg")
converter = LMBasedImageToMermaid.from_preset("default")
mermaid = asyncio.run(converter.convert(image.data))
print(f"Mermaid Syntax: \n{mermaid.result}")Output:
Customize Model
When using preset, the image-to-mermaid model can be changed by passing model_id into the lm_invoker_kwargs in from_preset() function
Customize Model and Prompt
Using a custom LM Request Processor allows you to customize model and/or prompt.
Last updated