Multimodality
Overview
Multimodal is a library designed to handle multimodal content processing in Generative AI applications. It provides two fundamental capabilities:
Modality Converter: Transform data from one modality to another (e.g., audio → text, image → text)
Modality Transformer: Handles the process of enriching a user source with additional information derived from various modalities, such as images, videos, or audio. It orchestrates one or more converters to add meaningful context to the query.
Last updated