Error Handling

Error handling in the Pipeline SDK provides robust mechanisms to gracefully handle failures during step execution. Each step can be configured with an error handling strategy that determines how errors are processed and what happens to the pipeline state when failures occur.

By default, all steps use the Raise strategy, which re-raises exceptions with enhanced context information. This is the strictest approach and ensures that errors immediately stop pipeline execution with detailed error messages.

from gllm_pipeline.steps._func import transform
from gllm_pipeline.steps.step_error_handler import RaiseStepErrorHandler

def may_fail(data: dict) -> str:
    if not data.get("valid"):
        raise ValueError("Invalid input")
    return data["text"].upper()

# By default, errors will be raised with context
step = transform(
    may_fail,
    input_map=["text", "valid"],
    output_state="result",
    # error_handler=RaiseStepErrorHandler()  # 👈 This is the default
)

When an error occurs, you'll get enhanced error messages like:

Error in Transform 'Transform_may_fail__abc123' during execution. 
Error type: ValueError. Original error: Invalid input

This default behavior is suitable for most production use cases where you want to catch and handle errors explicitly rather than silently continuing with potentially corrupted state.

Error Handling Strategies

The SDK provides four built-in error handling strategies, each designed for different use cases.

Raise (Default)

Re-raises the exception with enhanced context. The pipeline stops immediately and the error is propagated to the caller.

When to use:

  1. Production pipelines where data integrity is critical

  2. When you want to catch and handle errors explicitly in your application code

  3. When you need detailed error information for debugging

Example:

Keep

Preserves the current state without modification when an error occurs. The step is skipped and the pipeline continues with the next step.

When to use:

  1. When a step is optional and failures can be safely ignored

  2. When you want to continue processing even if some steps fail

  3. For non-critical enrichment steps

Example:

Empty

Sets the output state(s) to None when an error occurs, then continues execution.

When to use:

  1. When you want to explicitly mark that a step failed while continuing

  2. When downstream steps can handle None values gracefully

  3. For conditional processing based on whether a step succeeded

Example:

Fallback

Executes a custom fallback function when an error occurs. This is the most flexible approach, allowing you to define custom recovery logic.

When to use:

  1. When you have a specific fallback behavior for failures

  2. When you want to log errors and provide default values

  3. For graceful degradation with alternative processing

Example:

Error Context

When an error occurs, the SDK automatically captures detailed context information using the ErrorContext model. This context includes:

  • exception: The original exception that was raised

  • step_name: The name of the step where the error occurred

  • step_type: The type of step (e.g., "Transform", "Conditional")

  • state: The pipeline state at the time of the error

  • operation: Description of the operation being performed

  • additional_context: Any additional context information

This context is automatically included in error messages and passed to error handlers, making debugging much easier.

Best Practices

1. Choose the Right Strategy

  • Use Raise by default for production pipelines

  • Use Keep for truly optional steps that will not affect downstream processing

  • Use Empty when downstream steps need to know a step failed

  • Use Fallback for graceful degradation with alternative logic

2. Handle Errors at the Right Level

3. Validate Critical Data Early

4. Document Error Behavior

By choosing the appropriate error handling strategy for each step, you can build resilient pipelines that gracefully handle failures while maintaining data integrity.

Last updated