Embedding Model (EM) Invoker

gllm-inference | Tutorial: Embedding Model (EM) Invoker | Use Case: Your First RAG Pipeline| API Reference

What’s an EM Invoker?

The EM invoker is a unified interface designed to help you to convert inputs into into numerical vector representations. In this tutorial, you'll learn how to invoke an embedding model using OpenAIEMInvoker in just a few lines of code. You can also explore other types of EM Invokers, available here.

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-inference

# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/"  gllm-inference

Quickstart

Let’s jump into a basic example using OpenAIEMInvoker. We’ll ask the model a simple question and print the response.

import asyncio
from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.model import OpenAIEM

em_invoker = OpenAIEMInvoker(OpenAIEM.TEXT_EMBEDDING_3_SMALL)
response = asyncio.run(em_invoker.invoke("Hello world!"))
print(f"Vectorized text:\n{response}")

Expected Output

Vectorized text:
[-0.010044993832707405, ..., 0.0008305096416734159]

That’s it! You've just made your first successful embedding model call using OpenAIEMInvoker. Fast, clean, and ready to scale into more complex use cases!

Multimodal Input

Some embedding model providers, such as Voyage, also have the capability to vectorize more than just text! Let's try to embed an image using VoyageEMInvoker. To do this, you can get a Voyage API key and export it as an environment variable.

export VOYAGE_API_KEY="pa-..."

Then, we can embed multimodal inputs such as image by loading them as an Attachment object!

import asyncio
from gllm_inference.em_invoker import VoyageEMInvoker
from gllm_inference.model import VoyageEM
from gllm_inference.schema import Attachment

image = Attachment.from_path("path/to/image.png")

lm_invoker = VoyageEMInvoker(VoyageEM.VOYAGE_MULTIMODAL_3)
response = asyncio.run(lm_invoker.invoke(image))
print(f"Vectorized text:\n{response}")

Expected Output

Vectorized text:
[0.0296630859375, ..., -0.034912109375]

And there it is, you've successfully vertorized an image into its numerical vector representations!

Multiple Inputs

EM invokers can also be used to vectorize multiple inputs at once. This can be done by providing a list of inputs. When processing a list of inputs, the output will be a list of vectors, where each element corresponds to an element in the input list. Let's try it!

import asyncio
from gllm_inference.em_invoker import VoyageEMInvoker
from gllm_inference.model import VoyageEM
from gllm_inference.schema import Attachment

text1 = "Hello world!"
text2 = "The weather is sunny today."
image = Attachment.from_path("path/to/image.png")

lm_invoker = VoyageEMInvoker(VoyageEM.VOYAGE_MULTIMODAL_3)
response = asyncio.run(lm_invoker.invoke([text1, text2, image]))
print(f"Vectorized texts:\n{response}")

Expected Output

Vectorized texts:
[
    [0.0216064453125, ..., 0.05126953125],  # Result of text1
    [0.023193359375, ..., 0.00653076171875],  # Result of text2
    [0.029541015625, ..., -0.034912109375],  # Result of image
]

Retry & Timeout

Retry & timeout functionality provides robust error handling and reliability for embedding model interactions. It allows you to automatically retry failed requests and set time limits for operations, ensuring your applications remain responsive and resilient to network issues or API failures.

Retry & timeout can be configured via the RetryConfig class' parameters:

max_retries: Maximum number of retry attempts (defaults to 3 maximum retry attempts).
timeout: Maximum time in seconds to wait for each request (defaults to 30.0 seconds). To disable timeout, this parameter can be set to 0.0 second.

You can also configure other parameters available here. Now let's try to apply it to our EM invoker!

import asyncio
from gllm_core.utils.retry import RetryConfig
from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.model import OpenAIEM

retry_config = RetryConfig(max_retries=3, timeout=100)
em_invoker = OpenAIEMInvoker(OpenAIEM.TEXT_EMBEDDING_3_SMALL, retry_config=retry_config)
response = asyncio.run(em_invoker.invoke("Hello world!"))
print(f"Vectorized text:\n{response}")

Text Truncation

Text truncation allows you to control how text inputs are handled when they exceed the maximum length supported by the embedding model. This is particularly useful when dealing with long documents or when you need to ensure consistent input lengths.

Truncation can be configured using the TruncationConfig class with the following parameters:

max_length: Maximum number of characters to keep (required)
truncate_side: Which side to truncate from (defaults to TruncateSide.RIGHT)
- TruncateSide.LEFT: Keep the end of the text, truncate from the beginning
- TruncateSide.RIGHT: Keep the beginning of the text, truncate from the end (default)

import asyncio
from gllm_core.utils.retry import RetryConfig
from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.model import OpenAIEM
from gllm_inference.schema.config import TruncationConfig, TruncateSide

# Configure text truncation
truncation_config = TruncationConfig(
    max_length=1000,
    truncate_side=TruncateSide.RIGHT
)

em_invoker = OpenAIEMInvoker(
    OpenAIEM.TEXT_EMBEDDING_3_SMALL,
    truncation_config=truncation_config
)

long_text = "This is a very long text that exceeds the maximum length..." * 100
response = asyncio.run(em_invoker.invoke(long_text))
print(f"Vectorized text:\n{response}")

Troubleshooting

If you encounter errors, refer to the Troubleshooting Guide for detailed explanations of common errors and how to resolve them.

And there we go! You've successfully completed the tutorial of using EM invokers!

PreviousLM Request Processor (LMRP)NextCatalog

Last updated 9 days ago

Was this helpful?

hashtagWhat’s an EM Invoker?

hashtagInstallation

hashtagQuickstart

hashtagMultimodal Input

hashtagMultiple Inputs

hashtagRetry & Timeout

hashtagText Truncation

hashtagTroubleshooting

What’s an EM Invoker?

Installation

Quickstart

Multimodal Input

Multiple Inputs

Retry & Timeout

Text Truncation

Troubleshooting