object-unionSubgraphs

This guide will walk you through using pipeline subgraphs to break down complex pipelines into manageable, maintainable components. We'll transform a monolithic RAG pipeline that has become unwieldy into a well-organized system using focused subgraphs.

Pipeline subgraphs solve the complexity problem by providing a way to break down large pipelines into smaller, manageable pieces. Instead of having one massive pipeline with 20+ steps, you can create focused subgraphs, each responsible for a specific part of your workflow with its own clean state schema and testing strategy.

circle-info

This tutorial builds upon fundamental pipeline concepts. Ensure you understand basic pipeline construction before proceeding with subgraph architecture.

circle-exclamation
chevron-rightPrerequisiteshashtag

This example specifically requires:

  1. Completion of the Your First RAG Pipeline tutorial - understanding of basic pipeline construction

  2. Completion of all setup steps listed on the Prerequisites page

You should be familiar with these concepts and components:

  1. Components in Your First RAG Pipeline - Required foundation

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-pipeline gllm-rag gllm-core gllm-generation gllm-inference gllm-retrieval gllm-misc gllm-datastore

Project Setup

1

Understanding the Problem

When you first start building pipelines, everything seems straightforward. You create a few steps, connect them together, and your pipeline works perfectly.

Your pipeline starts simple: take a user query, retrieve some documents, and generate a response. But then you realize you need query preprocessing, document filtering, reranking, context building, prompt engineering, etc. Suddenly, your pipeline has grown from 3 steps to 15 steps.

Key Pain Points:

  • Pipeline Bloat: Simple pipelines grow into 15+ step monsters

  • State Chaos: Dozens of intermediate variables with unclear purposes

  • Testing Nightmares: Can't test components in isolation

  • Maintenance Difficulties: Changes in one area break unrelated functionality

2

The Subgraph Solution

Subgraphs solve these problems by providing:

  • Modular Design: Break complex pipelines into focused, manageable pieces

  • State Isolation: Each subgraph has its own clean state schema

  • Reusability: Use the same subgraph in multiple pipelines

  • Clearer Organization: Logical grouping of related functionality

  • Easier Maintenance: Modify one subgraph without affecting others

3

Project Structure

Create your project structure for the subgraph refactoring:

<your-project>/
├── modules/
│   └── [your actual components]
├── pipeline_builder.py             # 👈 Single file with all subgraphs

Following real-world patterns, we'll organize all subgraphs within a single pipeline builder class, just like production implementations.


Problem: The Monolithic Pipeline with Tons of Steps

Let's first examine a typical complex RAG pipeline that has become unwieldy:

1

The Monolithic Pipeline State

Notice how this state has 13 different variables - it's becoming impossible to track what each one does and when it's used.

2

The Monolithic Pipeline Implementation

This pipeline has several problems:

  1. Hard to understand what each section does

  2. Cluttered state with intermediate variables

  3. Difficult to test individual components

  4. Risky changes - modifying one part can break others


Solution: Building with Subgraphs

Now let's refactor this into a clean, modular pipeline builder following real-world patterns:

1) Create the Pipeline Builder Class

1

Create the main pipeline builder

Create pipeline_builder.py:

Benefits:

  • Clean organization: Each subgraph is a separate method

  • Real-world pattern: Matches production implementations

  • Easy to understand: Clear separation of concerns

2

Define the clean main state

Notice how the main state only contains 6 essential variables instead of the original 13!

2) Build Individual Subgraphs

1

Create the Query Processing Subgraph

Benefits:

  • Clear responsibility: Only handles query processing

  • Clean state: Just 3 relevant variables

  • Easy testing: Can test query processing in isolation

  • Explicit mapping: Clear input/output contracts

2

Create the Retrieval Subgraph

Benefits:

  • Focused functionality: Only handles document retrieval and context building

  • Isolated state: Contains only retrieval-related variables

  • Reusable: Can be used in different types of pipelines

3

Create the Generation Subgraph

Benefits:

  • Single responsibility: Only handles response generation

  • Clear data flow: Easy to understand prompt → generation → validation flow

  • Independent testing: Can test generation logic separately

3) Run the Subgraph Pipeline

1

Complete the pipeline builder

Add the complete implementation:

2

Run the pipeline

3

Observe the improved output

You should see much cleaner debug output with clearly separated subgraph execution:

Benefits of the subgraph output:

  • Clear boundaries: Easy to see where each logical unit starts and ends

  • Focused debugging: Problems can be isolated to specific subgraphs

  • Progress tracking: Better visibility into pipeline execution progress

Comparison: Before vs After

By transforming the monolithic pipeline into subgraphs, we achieved:

  • Before: 15 steps in one pipeline → After: 3 focused subgraphs

  • Before: 13 cluttered state variables → After: 6 clean state variables

  • Before: Hard to test individual components → After: Easy independent testing

  • Before: Changes risk breaking other parts → After: Isolated modifications

  • Before: Unclear responsibilities → After: Single responsibility per subgraph


Troubleshooting

Common Issues

  1. State mapping errors between subgraphs:

    • Ensure all required input states are properly mapped in input_map

    • Verify that output states from one subgraph match input requirements of the next

    • Check that variable names are consistent across subgraph boundaries

  2. Subgraph isolation breaking shared dependencies:

    • Make sure each subgraph includes all components it needs

    • Avoid assuming components are available from other subgraphs

    • Consider creating shared component factories for reusable dependencies

  3. Complex debugging across multiple subgraphs:

    • Use meaningful names for each subgraph for easier identification

    • Enable debug mode to see subgraph boundaries in execution logs

    • Test individual subgraphs in isolation before combining them

  4. Component implementation confusion:

    • Remember that the example components (QueryProcessor, etc.) are placeholders

    • Replace with your actual component implementations

    • Focus on the subgraph structure patterns, not the specific components

Debug Tips

  1. Test subgraphs individually: Each subgraph should work independently with its own state

  2. Use descriptive subgraph names: This makes debugging much easier

  3. Enable debug logging: Set debug: true to see subgraph execution boundaries

  4. Validate state schemas: Ensure each subgraph's state schema matches its actual usage

  5. Map states explicitly: Always use explicit state mapping rather than relying on defaults

  6. Follow production patterns: Organize subgraphs as methods within a pipeline builder class


Congratulations! You've successfully learned how to use pipeline subgraphs to transform complex, monolithic pipelines into clean, maintainable, and testable modular systems. By following production patterns with a single pipeline builder class, your subgraph organization will be both powerful and practical for real-world applications.

Last updated