imagesMultimodal Input Handling

This guide will walk you through adding multimodal input handling to your existing RAG pipeline. This will allow your pipeline to process more than just text inputs, making your application!

circle-info

This tutorial extends the Your First RAG Pipeline tutorial. Ensure you have followed the instructions to set up your repository.

chevron-rightPrerequisiteshashtag

This example specifically requires:

  1. Completion of the Your First RAG Pipelinearrow-up-right tutorial - this builds directly on top of it

  2. Completion of all setup steps listed on the Prerequisites page

  3. A working OpenAI API key configured in your environment variables

You should be familiar with these concepts and components:

  1. Your First RAG Pipeline - Required foundation

githubView full project code on GitHub

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-rag gllm-core gllm-generation gllm-inference gllm-pipeline gllm-retrieval gllm-misc gllm-datastore

How to Use this Guide

You can either:

  1. Download or copy the complete guide file(s) to get everything ready instantly by heading to 📂 Complete Guide Files section in the end of this page. You can refer to the guide whenever you need explanation or want to clarify how each part works.

  2. Follow along with each step to recreate the files yourself while learning about the components and how to integrate them.

Both options will work—choose based on whether you prefer speed or learning by doing!

Project Setup

1

Start From Your RAG Pipeline Project

Start with your completed RAG pipeline project from the Your First RAG Pipeline tutorial. We don't need to add any new file for this tutorial. Therefore, the structure should stay as is:

<project-name>/
├── data/
│   ├── <index>/...                     # preset data index folder
│   ├── chroma.sqlite3                  # preset database file
│   ├── imaginary_animals.csv           # sample data
├── modules/
│   ├── retriever.py
│   └── response_synthesizer.py
├── .env
├── indexer.py                    
└── pipeline.py    # 👈 Will be adjusted for multimodal input handling

1) Adding Multimodal Inputs Handling

Extending the Pipeline

Let's adjust the pipeline to handle multimodal inputs. In this tutorial, let's assume that the attachment files are passed as local paths through the pipeline state.

1

Define the extended state

Create a custom state that includes the attachment files as input as well as the extra contents list to be passed to the response synthesizer:

2

Create a function to create the extra contents

Our goal is to pass the input attachments as Attachment objects to the response synthesizer's extra_contents parameter. To do this, lets create a custom function!

3

Update the response synthesizer with a new prompt

We'll update this with a prompt that can test our multimodal functionality.

4

Update the pipeline steps

Define the step to format extra contents and add the extra content param to the response synthesizer.

5

Compose the final pipeline

Chain all steps to create the complete guardrail pipeline:

This creates a pipeline that can handle multimodal input files.

2) Run the Pipeline

circle-info

When running the pipeline, you may encounter an error like this:

[2025-08-26T14:36:10+0700.550 chromadb.telemetry.product.posthog ERROR] Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given

Don't worry about this, since we do not use this Chroma feature. Your Pipeline should still work.

Now that the pipeline is all set, let's try it!

1

Configure the pipeline state for testing

And that's it! Your pipeline should now be able to handle the attached multimodal files!

Troubleshooting

  1. Attachment loading fails:

    1. Ensure that the file exists in your local path.

    2. Ensure that the path is valid. Pay attention whether you're using full path or relative path.

  2. LM invocation fails:

    1. Ensure that the model you're using supports the attachment type and extension.

    2. Ensure that the attachment size does not exceed the model token limit.


Congratulations! You've successfully enhanced your RAG pipeline with multimodal input handling, allowing your application to process more than just text inputs!

Last updated