Multimodality

Multimodal is a library designed to handle multimodal content processing in Generative AI applications. It provides two fundamental capabilities:

Modality Converter: Transform data from one modality to another (e.g., audio → text, image → text)
Modality Transformer: Handles the process of enriching a user source with additional information derived from various modalities, such as images, videos, or audio. It orchestrates one or more converters to add meaningful context to the query.

Today, the following modality conversions are available:

Last updated 22 days ago

Was this helpful?