Hybrid Deep Researcher Pipeline

Overview

This guide demonstrates how to leverage multiple deep research systems in parallel within a pipeline.md to obtain comprehensive research results from different sources. By running OpenAIDeepResearcher and GLOpenDeepResearcher simultaneously, you can combine their unique strengths and produce a more thorough analysis.

The Pipeline orchestrates when and under what conditions parallel deep research is invoked, executes both researchers concurrently, and synthesizes their outputs into a unified response.

See complete code in GitHub

Prerequisites

This example specifically requires:

Completion of all setup steps listed on the Prerequisites page.

You should be familiar with these:

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline gl-odr-sdk

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline gl-odr-sdk

FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline gl-odr-sdk

Project Setup

Clone the repository

git clone https://github.com/gl-sdk/gen-ai-sdk-cookbook.git
cd gen-ai-sdk-cookbook/deep-research

Set UV authentication and install dependencies

Unix-based systems (Linux, macOS):

./setup.sh

For Windows:

setup.bat

Prepare .env file

OPENAI_API_KEY="..."
GLODR_API_KEY="..."

Implementation

In this example, we use parallel execution to run two different deep research systems simultaneously. The Pipeline handles routing logic, executes both researchers in parallel when deep research is needed, and then combines their results using a ResponseSynthesizer.

import asyncio

from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import GLOpenDeepResearcher, OpenAIDeepResearcher
from gllm_generation.deep_researcher.gl_open_deep_researcher import ResearchProfile
from gllm_generation.response_synthesizer import ResponseSynthesizer
from gllm_inference.lm_invoker.openai_lm_invoker import OpenAILMInvoker
from gllm_inference.output_parser.json_output_parser import JSONOutputParser
from gllm_inference.prompt_builder import PromptBuilder
from gllm_inference.request_processor import LMRequestProcessor
from gllm_inference.schema import LMOutput
from gllm_pipeline.router import LMBasedRouter
from gllm_pipeline.steps import parallel, step, switch
from pydantic import BaseModel


class DeepResearchState(BaseModel):
    """State for the deep research pipeline.

    Attributes:
        user_query (str): The user's research query.
        route (str | None): The routing decision (deep_research or normal).
        openai_result (LMOutput | None): Result from OpenAI deep researcher.
        glopen_result (LMOutput | None): Result from GL Open deep researcher.
        combined_result (str | LMOutput | None): Final combined result.
        event_emitter (EventEmitter): Event emitter for streaming.
    """

    user_query: str
    route: str | None = None
    openai_result: LMOutput | None = None
    glopen_result: LMOutput | None = None
    combined_result: str | LMOutput | None = None
    event_emitter: EventEmitter

    class Config:
        arbitrary_types_allowed = True


# Router LM Request Processor
lmrp = LMRequestProcessor(
    prompt_builder=PromptBuilder(
        user_template="""
        Based on the following user query, determine if it is a deep research query or a normal query.

        - **normal**: Casual greetings, small talk, or simple conversational queries that do not require
          in-depth research. Examples: "hello", "how are you", "what's the weather", "thanks", "goodbye".

        - **deep_research**: Queries that require comprehensive research, multi-source analysis, or
          in-depth exploration of a topic. Examples: "research the latest AI trends", "compare X vs Y",
          "analyze the market for...", "what are the pros and cons of...".

        Output the answer in JSON format with "route" as the key. For example:
        {{"route": "deep_research"}} or {{"route": "normal"}}

        Query: {text}
        """
    ),
    lm_invoker=OpenAILMInvoker(model_name="gpt-4o-mini"),
    output_parser=JSONOutputParser(),
)

# Step 1: Router
router = step(
    component=LMBasedRouter(
        valid_routes={"deep_research", "normal"},
        lm_request_processor=lmrp,
        default_route="normal",
    ),
    input_map={"text": "user_query"},
    output_state="route",
)

# Step 2a: OpenAI Deep Researcher
openai_deep_researcher = step(
    component=OpenAIDeepResearcher(model_name="o1-mini"),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="openai_result",
)

# Step 2b: GL Open Deep Researcher with GPTR-DEEP profile
glopen_deep_researcher = step(
    component=GLOpenDeepResearcher(profile="INTERNAL"),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="glopen_result",
)

# Step 3: Parallel execution of both deep researchers
parallel_deep_research = parallel(
    [openai_deep_researcher, glopen_deep_researcher],
    name="parallel_deep_research",
)

# Step 4: Response Synthesizer to combine results
reporter = step(
    component=ResponseSynthesizer.stuff_preset(
        model_id="openai/gpt-4o-mini",
        user_template="""
        You are tasked with combining research results from two different deep research systems.

        **OpenAI Deep Research Result:**
        {openai_result}

        **GL Open Deep Research Result:**
        {glopen_result}

        Please synthesize these two research results into a comprehensive, coherent answer that:
        1. Combines insights from both sources
        2. Highlights any complementary information
        3. Notes any contradictions or differences in findings
        4. Provides a unified, well-structured response

        Original Query: {query}
        """,
    ),
    input_map={
        "query": "user_query",
        "openai_result": "openai_result",
        "glopen_result": "glopen_result",
        "event_emitter": "event_emitter",
    },
    output_state="combined_result",
)

# Deep research branch: parallel execution + combination
deep_research_branch = parallel_deep_research | reporter

# Normal response for non-research queries
normal_response_synthesizer = step(
    component=ResponseSynthesizer.stuff_preset(
        model_id="openai/gpt-4o-mini",
        user_template="{query}",
    ),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="combined_result",
)

# Conditional step based on routing
conditional_step = switch(
    condition=lambda input: input["route"],
    branches={
        "deep_research": deep_research_branch,
        "normal": normal_response_synthesizer,
    },
)

# Complete pipeline
deep_research_pipeline = router | conditional_step
deep_research_pipeline.state_type = DeepResearchState


async def main() -> None:
    event_emitter = EventEmitter.with_print_handler()
    state = DeepResearchState(
        user_query="research about the latest trends in AI",
        event_emitter=event_emitter,
    )

    result = await deep_research_pipeline.invoke(state)

    print(result)


if __name__ == "__main__":
    asyncio.run(main())

Run the script

uv run 03_hybrid_deep_research_pipeline.py

How it works:

Router Step: The Pipeline evaluates the user query and determines whether it requires deep research or a simple response.
Parallel Deep Research: If deep research is needed, the Pipeline executes both OpenAI Deep Researcher and GL Open Deep Researcher simultaneously (not sequentially).
Reporter: Once both researchers complete, a ResponseSynthesizer combines their outputs into a unified, comprehensive answer.
Normal Response: For casual queries, the Pipeline routes to a simple response synthesizer.

Parallel Execution Benefits:

Faster Results: Both researchers run simultaneously, reducing total execution time
Diverse Perspectives: Combines different research approaches and data sources
Comprehensive Coverage: Leverages the unique strengths of each research system

Pipeline Architecture

┌─────────────────┐
│     Router      │  ← Determines if deep research is needed
└────────┬────────┘
         │
    ┌────▼────┐
    │ Switch  │
    └─┬─────┬─┘
      │     │
      │     └──────────────────────────┐
      │                                │
┌─────▼──────────────────┐    ┌───────▼────────┐
│  Parallel Execution    │    │ Normal Response│
│  ┌──────────────────┐  │    └────────────────┘
│  │ OpenAI Deep      │  │
│  │ Researcher       │  │
│  └──────────────────┘  │
│  ┌──────────────────┐  │
│  │ GL Open Deep     │  │
│  │ Researcher       │  │
│  └──────────────────┘  │
└─────────┬──────────────┘
          │
    ┌─────▼─────────┐
    │   Reporter    │  ← Combines both results
    │ (Synthesizer) │
    └───────────────┘

That's it! You've successfully implemented hybrid deep research with parallel execution!

Next Steps

Explore different research profiles for GLOpenDeepResearcher
Integrate with RAG pipelines by following Your First RAG Pipeline
Explore the API reference for advanced features

PreviousDeep Research Pipeline with Google Drive Connector NextTutorials

Last updated 1 day ago

Was this helpful?

hashtagOverview

hashtagInstallation

hashtagProject Setup

hashtagImplementation

hashtagRun the script

hashtagPipeline Architecture

hashtagNext Steps