Basic Concepts

An introduction to the concepts in our orchestration component.

A flow diagram of an orchestration pipeline, showing three sequential steps (Step 1, 2, and 3) processing data. Inputs include an Initial State and Runtime Context. Each step reads from and writes to a centralized "State" database and reads from a "Runtime Context" block, ultimately producing a Final State output.

The diagram above illustrates how the core concepts work together: Pipeline, Steps, State, and Runtime Context. As you read through this guide, refer back to the visualization to see how data flows through your pipeline and how runtime context influences execution without cluttering your state.

If you are looking to go straight into implementing a Pipeline, refer to the following pages:

Pipeline

The Pipeline is the core orchestration component that sequences and manages the execution of steps in our SDK. It orchestrates sequential step execution where each Step explicitly declares its read/write operations on a shared State object and reads from an immutable Runtime Context. The Pipeline wraps the LangGraph library, which provides a powerful way to define and execute complex workflows.

Our Pipeline:

Orchestrates Steps to run in sequence, managing the flow of data through the shared State.
Can be empty (acts as a pass-through), contain a list of steps, or be composed from other Pipelines.
Supports chaining and nesting so you can build larger flows from smaller ones.
Validates and enforces the structure of the data that moves through.
Separates mutable application state from immutable execution context, providing transparency in data flow.

For more information about LangGraph, refer to LangGraph overview.

Steps

Steps represent a single action in the Pipeline that reads the current state, does some work, and returns what changed or was added. Each Step explicitly declares its read/write operations on the shared State object and reads from the immutable Runtime Context. A Step can wrap a component so that it can be executed within the pipeline framework, with automatic input/output handling through the pipeline's State.

Each Step:

Explicitly declares which fields it reads from and writes to in the State, making dependencies clear.
Reads from the immutable Runtime Context for execution metadata without modifying it.
Connects into the Pipeline flow with defined inputs and outputs.
Typically has one entry and one exit, but can also branch or merge data.
Can be chained easily with other steps.

State

A State is the mutable data container that flows through your pipeline. It carries inputs, intermediate results, and outputs between steps. A State acts as the shared context for the entire workflow, evolving as each step modifies fields based on its explicitly defined dependencies. This design provides transparency in data flow and makes it easy to reason about step dependencies.

The State:

Acts as the contract between Steps — each Step explicitly declares which fields it reads and writes.
Evolves through the pipeline as steps modify fields, with each modification clearly traceable.
Encourages predictable schemata so Steps remain reusable and composable.
Supports mapping when entering or leaving nested flows to keep keys aligned.
Remains separate from the immutable Runtime Context, keeping application data distinct from execution metadata.

Runtime Context

Runtime Context is a separate channel for passing execution-time information to your pipeline without mixing it into your State. While State carries the data being processed (inputs, intermediate results, outputs), Runtime Context carries metadata about how to process that data — things like user sessions, feature flags, or environment settings.

The Runtime Context:

Keeps execution metadata separate from business data, so your State schema stays clean and focused.
Is accessible to all Steps during execution but doesn't persist in the State between steps.
Is optional — you only define a context_schema when you need to pass runtime information.

When to use Runtime Context

You should use Runtime Context when:

You need to pass user-specific information (e.g., session_id, user_id) that affects execution but isn't part of the core data flow.
You want to control pipeline behavior with feature flags or configuration without hardcoding values.
You're converting a Pipeline to a tool and need to accept both input data and execution metadata.

You should NOT use Runtime Context when:

The information is part of the actual data being processed. That belongs in State.
The information needs to be persisted or transformed between steps. Use State instead.

Putting It All Together: A Simple RAG Example

Let's see how these concepts work together in a typical RAG (Retrieval-Augmented Generation) pipeline:

The Pipeline orchestrates two primary execution phases in sequence:

Retrieval: Fetches relevant information from the knowledge base.
Generation: Consolidates information into a coherent answer.

The Steps each perform a specific task by interacting with shared resources:

Retrieval Step: Reads the query from State and uses runtime parameters (like index_name and top_k) to find and write back relevant chunks.
Generation Step: Reads the retrieved chunks and the query from State to generate and write the final response.

The State acts as the mutable data container that evolves through the pipeline:

query: The initial user input that triggers the process.
retrieved_chunks: The context-specific data hunks added during the Retrieval phase.
response: The final generated output added during the Generation phase.

The Runtime Context provides static execution metadata required for the operation:

model_name: Specifies which LLM to use for generation.
index_name: Identifies the vector database or search index to query.
top_k: Determines the number of document chunks to retrieve.
user_id: Manages identity, permissions, or personalized configurations.

Notice how State carries the data being processed (the "what") while Runtime Context carries information about how to process it (the "how"). Each Step reads what it needs from both containers, executes its execute() method, and writes updates back to the State — all orchestrated seamlessly by the Pipeline.

When to Use Pipeline

The orchestration components in this SDK are optional. You are not required to build a full Pipeline with custom States and Steps for every use case. The framework is designed to let you opt-in to features as your requirements grow.

Orchestration Component

Use When...

Don't Use When...

Pipeline

You need to orchestrate complex workflows or visualize execution flows.

You have a simple linear script and don't need graph capabilities.

Steps

You need robustness (retries, caching, error handling) or observability.

You are writing a quick prototype or simple function logic.

State

You want custom or stricter contracts for your data flow.

The built-in RAGState covers most standard RAG use cases.

Runtime Context

You need execution metadata (e.g. session_id) or static configuration.

You don't need anything other than what is in the State.

Advanced Composition

Once you understand how Pipeline, Steps, State, and Runtime Context work together, you can start composing more complex workflows using Subgraphs.

Subgraph

A Subgraph is a reusable Pipeline that could be embedded inside another Pipeline.

A Subgraph:

Encapsulates a sequence of actions, so fomplex logic stays modular and easy to reuse.
Maps inputs from the parent flow into its flow.
Improves error context by hinting which inner action failed.

When to use a Subgraph

You should consider using a Subgraph when:

The same group of actions appear in more than one Pipeline.
A portion of your Pipeline is conceptually one functionality (e.g. "retrieve", "format").
You intend to evolve or swap implementations without changing the parent flow.

You should NOT use a Subgraph when:

The sequence is tiny, only used once, and shares the same State schema as the parent.
Nesting adds more mental overhead than it removes.

PreviousOrchestration NextState

Last updated 5 days ago