State

A State is the shared data dictionary that flows through your Pipeline. A State is currently defined as a TypedDict, so keys and value types are explicit. Each step reads from and writes to the State as it executes.

Default State: RAGState

By default, a Pipeline in our SDK is equipped with RAGState as it state, which is defined in gllm_pipeline.pipeline.states. The state keys are based on the keys that you might find in an Retrieval-Augmented Generation (RAG) pipeline. It is also equipped with a special state key for an EventEmitter for streaming purposes.

class RAGState(TypedDict):
    user_query: str
    queries: list[str]
    retrieval_params: dict[str, Any]
    chunks: list
    history: str
    context: str
    response: str
    references: str | list[str]
    event_emitter: EventEmitter

Defining a Custom State

The default RAGState may not be suitable for your purposes. For example, when defining a Subgraph, you may not need all of the keys. In contrast, there may be some additional state keys that you require. In these cases, you can define your own state structure.

To do so:

1

Create state class

Define a TypedDict . This can be in the same file as the Pipeline or in a different module.

2

Apply to pipeline

Pass the TypedDict into the state_type argument when creating the Pipeline.

Using a Pydantic BaseModel as a State

In addition to TypedDict, you can also use a Pydantic BaseModel as your state. This provides runtime validation, default values, and enhanced type safety compared to TypedDict. The SDK includes RAGStateModel as a Pydantic alternative to RAGState.

To use a Pydantic BaseModel as your state:

1

Create state class

Define a Pydantic BaseModel. This can be in the same file as the Pipeline or in a different module.

2

Apply to pipeline

Pass the Pydantic BaseModel into the state_type argument when creating the Pipeline.

Pydantic BaseModel comes with the following benefits:

  1. Runtime validation: Automatic type checking and validation.

  2. Default values: Use Field(default=...) or Field(default_factory=...) for defaults.

  3. Enhanced type safety: Better IDE support and error messages.

  4. Custom validation: Add validators using @field_validator or @model_validator.

  5. JSON serialization: Built-in model_dump() and model_dump_json() methods.

Last updated