Utilize and Integrate Deep Researcher Component

Overview

This guide shows how to perform deep research using the Deep Researcher component starting from simple, direct usage and gradually moving toward more advanced orchestration patterns.

Additionally, Component can be placed inside a Pipeline when you need additional logic such as context preparation or routing decisions. The Pipeline orchestrates when and under what conditions deep research is invoked, but does not define how deep research itself works internally.

Prerequisites

This example specifically requires:

completion of all setup steps listed on the Prerequisites page.
API key for deep researcher component set. Please refer to

You should be familiar with these:

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline

FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline

Utilize Deep Research as a Component

See Quickstart for the simplest way to perform deep research using the GL SDK, which is by invoking deep research directly without any Pipeline orchestration.

Integrate in a Research Pipeline with Routing

This section demonstrates how to place DeepResearcher Component inside a Pipeline to orchestrate when it is invoked, based on the characteristics of a user query.

Here, the Pipeline is responsible for:

inspecting the incoming request
deciding whether deep research is required
routing execution accordingly

The Pipeline does not define how deep research is executed internally. Deep research is invoked as an encapsulated step, and its internal reasoning remains provider-defined.

See complete code in GitHub

Project Setup

Clone the repository

git clone https://github.com/gl-sdk/gen-ai-sdk-cookbook.git
cd gen-ai-sdk-cookbook/deep-research

Set UV authentication and install dependencies

Unix-based systems (Linux, macOS):

./setup.sh

For Windows:

setup.bat

Prepare .env file

OPENAI_API_KEY="..."

Implementation

In this example, DeepResearcher Component is used as one step within a Pipeline. The Pipeline handles routing logic and context preparation, while deep research itself remains a standalone invocation.

import asyncio

from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import OpenAIDeepResearcher
from gllm_generation.response_synthesizer import ResponseSynthesizer
from gllm_inference.lm_invoker.openai_lm_invoker import OpenAILMInvoker
from gllm_inference.output_parser.json_output_parser import JSONOutputParser
from gllm_inference.prompt_builder import PromptBuilder
from gllm_inference.request_processor import LMRequestProcessor
from gllm_inference.schema import LMOutput
from gllm_pipeline.router import LMBasedRouter
from gllm_pipeline.steps import step, switch
from pydantic import BaseModel


class DeepResearchState(BaseModel):
    user_query: str
    route: str | None
    result: str | LMOutput | None
    event_emitter: EventEmitter

    class Config:
        arbitrary_types_allowed = True

lmrp = LMRequestProcessor(
    prompt_builder=PromptBuilder(
        user_template="""
        Based on the following user query, determine if it is a deep research query or a normal query.

        - **normal**: Casual greetings, small talk, or simple conversational queries that do not require
          in-depth research. Examples: "hello", "how are you", "what's the weather", "thanks", "goodbye".

        - **deep_research**: Queries that require comprehensive research, multi-source analysis, or
          in-depth exploration of a topic. Examples: "research the latest AI trends", "compare X vs Y",
          "analyze the market for...", "what are the pros and cons of...".

        Output the answer in JSON format with "route" as the key. For example:
        {{"route": "deep_research"}} or {{"route": "normal"}}

        Query: {text}
        """
    ),
    lm_invoker=OpenAILMInvoker(model_name="gpt-5-nano"),
    output_parser=JSONOutputParser(),
)

router = step(
    component=LMBasedRouter(
        valid_routes={"deep_research", "normal"},
        lm_request_processor=lmrp,
        default_route="normal",
    ),
    input_map={"text": "user_query"},
    output_state="route",
)

deep_researcher = step(
    component=OpenAIDeepResearcher(model_name="o4-mini-deep-research"),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="result",
)

normal_response_synthesizer = step(
    component=ResponseSynthesizer.stuff_preset(
        model_id="openai/gpt-5-nano",
        user_template="{query}",
    ),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="result",
)

conditional_step = switch(
    condition=lambda input: input["route"],
    branches={"deep_research": deep_researcher, "normal": normal_response_synthesizer},
)

deep_research_pipeline = router | conditional_step
deep_research_pipeline.state_type = DeepResearchState


async def main() -> None:
    event_emitter = EventEmitter.with_print_handler()
    state = DeepResearchState(
        user_query="research about the latest trends in AI",
        event_emitter=event_emitter,
        route=None,
        result=None,
    )

    result = await deep_research_pipeline.invoke(state)

    print(result)


if __name__ == "__main__":
    asyncio.run(main())

Run the script

uv run 01_deep_research_pipeline.py

How it works:

The Pipeline evaluates the user query and determines the appropriate execution path.
If deep research is required, the Pipeline invokes the deep research step.
Otherwise, the Pipeline routes the request to a simpler response path.
The Pipeline returns the final result produced by the selected path.

The deep research step is treated as an encapsulated unit; the Pipeline does not break down or modify its internal execution.

Deep Research Pipeline with Google Drive Integration

This section demonstrates Pipeline orchestration with additional data sources, using Google Drive as an example.

In this setup:

the Pipeline controls routing and execution flow
Google Drive access is provided via an MCP connector
Deep research is invoked as an encapsulated step with additional data available during execution

See complete code in GitHub

Project Setup

Clone the repository

git clone https://github.com/gl-sdk/gen-ai-sdk-cookbook.git
cd gen-ai-sdk-cookbook/deep-research

Set UV authentication and install dependencies

Unix-based systems (Linux, macOS):

./setup.sh

For Windows:

setup.bat

Prepare .env file with Google Drive authentication

Get the auth token from OpenAI Connector Guide

When generating the token, make sure to enable the following scopes:

userinfo.email
userinfo.profile
drive.readonly

Add to .env:

OPENAI_API_KEY="..."
GOOGLE_DRIVE_AUTH_TOKEN="..."

Implementation

In this example, Google Drive is made available to deep research through an MCP connector, while the Pipeline determines when deep research should be invoked.

import asyncio
import os

from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import OpenAIDeepResearcher
from gllm_generation.response_synthesizer import ResponseSynthesizer
from gllm_inference.lm_invoker.openai_lm_invoker import OpenAILMInvoker
from gllm_inference.output_parser.json_output_parser import JSONOutputParser
from gllm_inference.prompt_builder import PromptBuilder
from gllm_inference.request_processor import LMRequestProcessor
from gllm_inference.schema import LMOutput, NativeTool
from gllm_pipeline.router import LMBasedRouter
from gllm_pipeline.steps import step, switch
from pydantic import BaseModel


class DeepResearchState(BaseModel):
    user_query: str
    route: str | None
    result: str | LMOutput | None
    event_emitter: EventEmitter

    class Config:
        arbitrary_types_allowed = True

lmrp = LMRequestProcessor(
    prompt_builder=PromptBuilder(
        user_template="""
        Based on the following user query, determine if it is a deep research query or a normal query.

        - **normal**: Casual greetings, small talk, or simple conversational queries that do not require
          in-depth research. Examples: "hello", "how are you", "what's the weather", "thanks", "goodbye".

        - **deep_research**: Queries that require comprehensive research, multi-source analysis, or
          in-depth exploration of a topic. Examples: "research the latest AI trends", "compare X vs Y",
          "analyze the market for...", "what are the pros and cons of...".

        Output the answer in JSON format with "route" as the key. For example:
        {{"route": "deep_research"}} or {{"route": "normal"}}

        Query: {text}
        """
    ),
    lm_invoker=OpenAILMInvoker(
        model_name="gpt-5-nano",
    ),
    output_parser=JSONOutputParser(),
)

router = step(
    component=LMBasedRouter(
        valid_routes={"deep_research", "normal"},
        lm_request_processor=lmrp,
        default_route="normal",
    ),
    input_map={"text": "user_query"},
    output_state="route",
)

connector = NativeTool.mcp_connector(
    name="google_drive",
    connector_id="connector_googledrive",
    auth="<google_auth_token>",
)

deep_researcher = step(
    component=OpenAIDeepResearcher(
        model_name="o4-mini-deep-research",
        tools=[connector]
    ),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="result",
)

normal_response_synthesizer = step(
    component=ResponseSynthesizer.stuff_preset(
        model_id="openai/gpt-5-nano",
        user_template="{query}",
    ),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="result",
)

conditional_step = switch(
    condition=lambda input: input["route"],
    branches={"deep_research": deep_researcher, "normal": normal_response_synthesizer},
)

deep_research_pipeline = router | conditional_step
deep_research_pipeline.state_type = DeepResearchState


async def main() -> None:
    event_emitter = EventEmitter.with_print_handler()
    state = DeepResearchState(
        user_query="research information from my Google Drive about AI trends",
        event_emitter=event_emitter,
        route=None,
        result=None,
    )

    result = await deep_research_pipeline.invoke(state)

    print(result)


if __name__ == "__main__":
    asyncio.run(main())

The Google Drive connector extends the data available during research, but does not change the execution flow or reasoning strategy of deep research itself.

Run the script

uv run 02_deep_research_google_drive_pipeline.py

Benefits:

Make documents stored in Google Drive available as research context
Combine private documents with public information during research
Integrate external data sources without changing research execution logic

PreviousMulti-turn Conversation NextBuild Ingestion Pipeline

Last updated 5 hours ago

Was this helpful?

hashtagOverview

hashtagInstallation

hashtagUtilize Deep Research as a Component

hashtagIntegrate in a Research Pipeline with Routing

hashtagProject Setup

hashtagImplementation

hashtagRun the script

hashtagDeep Research Pipeline with Google Drive Integration

hashtagProject Setup

hashtagImplementation

hashtagRun the script

Overview

Installation

Utilize Deep Research as a Component

Integrate in a Research Pipeline with Routing

Project Setup

Implementation

Run the script

Deep Research Pipeline with Google Drive Integration

Project Setup

Implementation

Run the script