Image

gllm-docproc | Tutorial: Image Data Generator | Use Case: Advanced DPO Pipeline | API Reference

Image Data Generator is a component that processes image elements and generates data derived from their visual content.

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc[image]"

You can use the following as a sample file: imageloader-output.json.

Image Caption Data Generator

ImageCaptionDataGenerator is responsible for processing image elements and generating captions by leveraging BaseImageToCaption from gllm-multimodal.

1

Create a script called main.py:

import json
from gllm_multimodal.modality_converter.image_to_text.image_to_caption import LMBasedImageToCaption
from gllm_docproc.data_generator.image_data_generator import ImageCaptionDataGenerator

# Load the input elements to be processed
with open('./data/source/input_elements.json', 'r') as file:
    elements = json.load(file)

# Initialize the ImageCaptionDataGenerator with a preset image-to-caption model
image_to_caption = LMBasedImageToCaption.from_preset()
image_caption_data_generator = ImageCaptionDataGenerator(image_to_caption)

# Generate captions for image elements
output_elements = image_caption_data_generator.generate(elements)
print(output_elements)
2

Run the script:

python main.py
3

The loader will generate the following: output JSON.

Multi Model Image Caption Data Generator

MultiModelImageCaptionDataGenerator is responsible for handling image captioning across multiple models with lazy initialization by leveraging LMBasedImageToCaption from gllm-multimodal.

1

Create a script called main.py:

2

Run the script:

3

The loader will generate the following: output JSON.

Last updated