Image to Mermaid

Introduction

The Image to Mermaid component converts flowchart and diagram images into Mermaidarrow-up-right syntax using multimodal LLMs. It analyzes visual structures (nodes, shapes, connectors) and generates valid Mermaid code that preserves the diagram's logic and relationships.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-multimodal" 

Quickstart

The simplest way to initialize Image to Mermaid component is to use the built-in preset.

file-image
22KB
import asyncio

from gllm_inference.schema import Attachment
from gllm_multimodal.modality_converter.image_to_text.image_to_mermaid import LMBasedImageToMermaid

image = Attachment.from_path("./flowchart.jpg")
converter = LMBasedImageToMermaid.from_preset("default")
mermaid = asyncio.run(converter.convert(image.data))
print(f"Mermaid Syntax: \n{mermaid.result}")

Output:

Customize Model

When using preset, the image-to-mermaid model can be changed by passing model_id into the lm_invoker_kwargs in from_preset() function

Customize Model and Prompt

Using a custom LM Request Processor allows you to customize model and/or prompt.

Last updated