Audio Interface

Add audio input/output to AIP agents using the provider-agnostic audio interface. The planned default implementation uses realtime sessions (LiveKit AgentSession) so you can speak to an agent while keeping tool calls visible and routed through AIP.

Audio interface is beta and local-only. You must run the LiveKit server and worker yourself. The CLI does not expose audio sessions yet. This page documents a design preview; APIs and behavior may change before release.

Prerequisites

A LiveKit server URL and credentials.
Local audio devices (mic/speaker) if you enable input/output.
SDK install with audio extras:

pip install "glaip-sdk[audio]"

Quick Start

Create the audio session config first, then pass it into Agent.

import os

from glaip_sdk.agents import Agent
from glaip_sdk.audio_interface import (
    AudioIOConfig,
    AudioSessionConfig,
    LiveKitConfig,
)

audio_config = AudioSessionConfig(
    provider="livekit",
    io=AudioIOConfig(input_enabled=True, output_enabled=True),
    provider_config=LiveKitConfig(
        url="wss://livekit.example",
        api_key=os.environ["LIVEKIT_API_KEY"],
        api_secret=os.environ["LIVEKIT_API_SECRET"],
    ),
)

agent = Agent(
    name="voice-agent",
    instruction="You are a helpful assistant.",
    audio=audio_config,
)

agent.start_audio()

Provider Model

The audio interface is provider-agnostic. The provider field selects the implementation and defaults to livekit when omitted. Provider-specific settings are passed via the provider_config field in the typed config (or a provider-specific key in raw agent_config.audio).

Current default for AIP: LiveKit AgentSession with tool-visible audio sessions.

Tool Call Visibility

The audio interface keeps tool calls visible so you can log or display them. Attach an AudioToolEventSink to capture the tool name, arguments, and result for each call.

Configuration Tips

Audio input/output: Set input_enabled or output_enabled to False to run input-only or output-only sessions.
Devices: Supply input_device or output_device when multiple audio devices are present.
STT/TTS: Provider-specific. LiveKit handles audio transport; transcription and synthesis live in the LiveKit worker/agent. Providers that expose model selection use AudioModelConfig (see the GL SDK realtime session tutorial).
Provider config: LiveKitConfig expects the server URL, api_key, and api_secret; room_name and identity are optional.

Limitations

Local-only; no AIP-hosted audio service yet.
LiveKit is the only provider supported in this phase for AIP, but the API is provider-agnostic for future providers.
CLI support is intentionally deferred.

Troubleshooting

Symptom

Likely cause

Fix

AudioSessionUnavailableError

LiveKit deps are missing

Install glaip-sdk[audio] or add the livekit extras.

AudioConfigError

URL/api key/secret missing

Check LiveKitConfig and env vars LIVEKIT_API_KEY / LIVEKIT_API_SECRET.

No audio / device error

Device not available

Disable audio output or set input_device/output_device.

Agents guide — manage agent configs and runtime overrides.
Tools guide — inspect tool definitions and outputs.
Security & privacy — handle credentials and sensitive data.

External References

GL SDK realtime session tutorial

PreviousSequential Pattern NextContents

Last updated 8 days ago

hashtagPrerequisites

hashtagQuick Start

hashtagProvider Model

hashtagTool Call Visibility

hashtagConfiguration Tips

hashtagLimitations

hashtagTroubleshooting

hashtagRelated Documentation

hashtagExternal References