Embedding Model (EM) Invoker

gllm-inference | Tutorial: Embedding Model (EM) Invoker | Use Case: Your First RAG Pipeline| API Reference

What’s an EM Invoker?

The EM invoker is a unified interface designed to help you to convert inputs into into numerical vector representations. In this tutorial, you'll learn how to invoke an embedding model using OpenAIEMInvoker in just a few lines of code. You can also explore other types of EM Invokers, available here.

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-inference

Quickstart

Let’s jump into a basic example using OpenAIEMInvoker. We’ll ask the model a simple question and print the response.

import asyncio
from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.model import OpenAIEM

em_invoker = OpenAIEMInvoker(OpenAIEM.TEXT_EMBEDDING_3_SMALL)
response = asyncio.run(em_invoker.invoke("Hello world!"))
print(f"Vectorized text:\n{response}")

Expected Output

That’s it! You've just made your first successful embedding model call using OpenAIEMInvoker. Fast, clean, and ready to scale into more complex use cases!

Multimodal Input

Some embedding model providers, such as Voyage, also have the capability to vectorize more than just text! Let's try to embed an image using VoyageEMInvoker. To do this, you can get a Voyage API key and export it as an environment variable.

Then, we can embed multimodal inputs such as image by loading them as an Attachment object!

Expected Output

And there it is, you've successfully vertorized an image into its numerical vector representations!

Multiple Inputs

EM invokers can also be used to vectorize multiple inputs at once. This can be done by providing a list of inputs. When processing a list of inputs, the output will be a list of vectors, where each element corresponds to an element in the input list. Let's try it!

Expected Output

Retry & Timeout

Retry & timeout functionality provides robust error handling and reliability for embedding model interactions. It allows you to automatically retry failed requests and set time limits for operations, ensuring your applications remain responsive and resilient to network issues or API failures.

Retry & timeout can be configured via the RetryConfig class' parameters:

  1. max_retries: Maximum number of retry attempts (defaults to 3 maximum retry attempts).

  2. timeout: Maximum time in seconds to wait for each request (defaults to 30.0 seconds). To disable timeout, this parameter can be set to 0.0 second.

You can also configure other parameters available here. Now let's try to apply it to our EM invoker!

Text Truncation

Text truncation allows you to control how text inputs are handled when they exceed the maximum length supported by the embedding model. This is particularly useful when dealing with long documents or when you need to ensure consistent input lengths.

Truncation can be configured using the TruncationConfig class with the following parameters:

  1. max_length: Maximum number of characters to keep (required)

  2. truncate_side: Which side to truncate from (defaults to TruncateSide.RIGHT)

    • TruncateSide.LEFT: Keep the end of the text, truncate from the beginning

    • TruncateSide.RIGHT: Keep the beginning of the text, truncate from the end (default)

And there we go! You've successfully completed the tutorial of using EM invokers!

Last updated