Structured Output

In many real-world applications, we don't just want natural language responses — we want structured data that our programs can parse and use directly.

The good news? You don’t need to embed JSON schemas directly in your prompt. Instead, just define your expected output using:

  1. A Pydantic BaseModel class (recommended)

  2. A JSON schema dictionary compatible with Pydantic's schema format

When structured output is enabled:

  1. The model will not stream.

  2. The output will be stored in the .structured_output attribute of the response.

  3. The output type depends on the input schema:

    1. Pydantic instance → Pydantic BaseModel instance

    2. JSON schema dict → Python dictionary

You can define your expected response format as a Pydantic class. This ensures strong type safety and makes the output easier to work with in Python.

from pydantic import BaseModel
from typing import List

from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.prompt_builder import PromptBuilder

class Activity(BaseModel):
    type: str
    activity_location: str
    description: str

class ActivityList(BaseModel):
    location: str
    activities: List[Activity]

system_template = (
    "You are a helpful assistant who specializes in recommending activities.\n"
)
user_template = "{question}"

builder = PromptBuilder(system_template=system_template, user_template=user_template)
prompt = builder.format(question="I want to go to Tokyo, Japan. What should I do?")

response = asyncio.run(lm_invoker.invoke(prompt, output_schema=ActivityList))
print(f"Response: {response}")

Output:

LMOutput(
    structured_output=ActivityList(
        location="Tokyo, Japan",
        activities=[
            Activity(
                type="Cultural Experience",
                activity_location="Asakusa",
                description="Visit the iconic Senso-ji Temple."
            ),
            Activity(
                type="Shopping",
                activity_location="Shibuya",
                description="Experience the bustling Shibuya Crossing."
            ),
        ]
    )
)

Example 2: Using a JSON Schema Dictionary

Alternatively, you can define the structure using a JSON schema dictionary. This is useful in situations where dynamic schema generation is required or when operating in environments that don’t use Pydantic.

from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_inference.prompt_builder import PromptBuilder

# Define JSON schema
activity_schema = {
    "title": "ActivityList",
    "type": "object",
    "properties": {
        "location": {"type": "string"},
        "activities": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "type": {"type": "string"},
                    "activity_location": {"type": "string"},
                    "description": {"type": "string"},
                },
                "required": ["type", "activity_location", "description"],
                "additionalProperties": False
            }
        }
    },
    "required": ["location", "activities"],
    "additionalProperties": False
}

system_template = (
    "You are a helpful assistant who specializes in recommending activities.\n"
)
user_template = "{question}"

builder = PromptBuilder(system_template=system_template, user_template=user_template)
prompt = builder.format(question="I want to go to Tokyo, Japan. What should I do?")

response = asyncio.run(lm_invoker.invoke(prompt, output_schema=activity_schema))
print(f"Response: {response}")

Output

LMOutput(
    structured_output={
        "location": "Tokyo, Japan",
        "activities": [
            {
                "type": "Cultural Experience",
                "activity_location": "Asakusa",
                "description": "Visit the iconic Senso-ji Temple."
            },
            {
                "type": "Shopping",
                "activity_location": "Shibuya",
                "description": "Experience the bustling Shibuya Crossing."
            }
        ]
    }
)

Generate JSON Schema from a BaseModel

If you're using Pydantic and want to generate the JSON Schema automatically, you can convert your model like this:

schema_dict = ActivityList.model_json_schema()

This allows you to dynamically generate a schema for environments where Pydantic models aren’t accepted directly, while still enjoying the benefits of static typing.

Last updated