[BETA] Realtime Session

gllm-inference | Tutorial: [BETA] Realtime Session | API Reference

What’s a Realtime Session?

The realtime session is a unified interface designed to help you interact with language models that supports realtime interactions. In this tutorial, you'll learn how to perform realtime session using the GoogleRealtimeSession module in just a few lines of code.

Prerequisites

This example specifically requires:

  1. Completion of all setup steps listed on the Prerequisites page.

  2. Setting a Gemini API key in the GOOGLE_API_KEY environment variable.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-inference

Quickstart

Let’s jump into a basic example using GoogleRealtimeSession.

from dotenv import load_dotenv
load_dotenv()

from gllm_inference.realtime_session import GoogleRealtimeSession
import asyncio

realtime_session = GoogleRealtimeSession(model_name="gemini-2.5-flash-native-audio-preview-12-2025")
asyncio.run(realtime_session.start())

Notice that after the realtime session starts, the following message appears in the console:

The conversation starts:

The realtime session modules utilize a set of input and output streamers to define the input sources and output destinations when interacting with the language model. Notice that by default, it uses the following IO streamers:

  1. KeyboardInputStreamer : Sending text inputs sent via the keyboard to model.

  2. ConsoleOutputStreamer : Displaying text outputs from the model to the console.

This means that by default, the GoogleRealtimeSession modules support text inputs and text outputs. Try typing through your keyboard to start interacting with the model!

Interaction Example:

When you're done, you can type /quit to end the conversation.

Ending the conversation:

IO Streamer Customization

Now that we've learned the basics, let's try using other kinds of IO streamers! In the example below, we're going to utilize the LinuxMicInputStreamer and LinuxSpeakerOutputStreamer.

The conversation starts:

Try speaking through your microphone and have fun conversing with the language models in realtime!

After you're done, try combining them with our default IO streamers and see what happens!

Future Plans

In the future, more IO streamers can be added to allow for more robust realtime experience, this may include but are not limited to:

  1. Input streamers

    1. FileInputStreamer

    2. ScreenCaptureInputStreamer

    3. CameraInputStreamer

    4. WindowsMicInputStreamer

    5. MacMicInputStreamer

  2. Output streamers

    1. FileOutputStreamer

    2. WindowsSpeakerOutputStreamer

    3. MacSpeakerOutputStreamer

Last updated