Multimodality

Overview

Multimodal is a library designed to handle multimodal content processing in Generative AI applications. It provides two fundamental capabilities:

  1. Modality Converter: Transform data from one modality to another (e.g., audio → text, image → text)

  2. Modality Transformer: Handles the process of enriching a user source with additional information derived from various modalities, such as images, videos, or audio. It orchestrates one or more converters to add meaningful context to the query.

Last updated