Deep Research

Overview

This guide shows how to perform deep research using the GL SDK, starting from simple, direct usage and gradually moving toward more advanced orchestration patterns.

At its core, the GL SDK provides a DeepResearcher Component that can be used on its own to execute deep research against different providers.

Optionally, this Component can be placed inside a Pipeline when you need additional logic such as context preparation or routing decisions. The Pipeline orchestrates when and under what conditions deep research is invoked, but does not define how deep research itself works internally.

You can:

use the DeepResearcher directly for straightforward research tasks, or
compose it inside a Pipeline to build richer flows around deep research

This page demonstrates both approaches.

1. Deep Research Hello World

This section shows the simplest way to perform deep research using the GL SDK, by invoking deep research directly without any Pipeline orchestration.

It demonstrates using DeepResearcher as a GL SDK Component, focusing on the core deep research capability with minimal setup and no additional control logic.

Each example uses the same research() interface while swapping out the underlying deep research provider. This allows different providers to be used interchangeably, without changing the calling logic that invokes the Component.

Installation

To run the examples below, you only need the GL SDK packages installed and valid credentials for the deep research provider you want to use.

The following commands install the required SDK from the internal package registry. Choose the command that matches your environment.

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-generation"

# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/"  "gllm-generation"

Implementation

Below are minimal examples that perform deep research using the GL SDK.

Each example follows the same flow:

define a research query
invoke deep research using the same research() call
receive streamed progress and final results via an event emitter

The only difference between examples is the underlying deep research provider being used. The calling code and usage pattern remain the same.

from dotenv import load_dotenv
load_dotenv()

import asyncio
from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import GoogleDeepResearcher

query = "Create a concise report about why bananas are yellow."
event_emitter = EventEmitter.with_print_handler()

deep_researcher = GoogleDeepResearcher()
asyncio.run(deep_researcher.research(query=query, event_emitter=event_emitter))

from dotenv import load_dotenv
load_dotenv()

import asyncio
from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import ParallelDeepResearcher

query = "Create a concise report about why bananas are yellow."
event_emitter = EventEmitter.with_print_handler()

deep_researcher = ParallelDeepResearcher()
asyncio.run(deep_researcher.research(query=query, event_emitter=event_emitter))

from dotenv import load_dotenv
load_dotenv()

import asyncio
from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import PerplexityDeepResearcher

query = "Create a concise report about why bananas are yellow."
event_emitter = EventEmitter.with_print_handler()

deep_researcher = PerplexityDeepResearcher()
asyncio.run(deep_researcher.research(query=query, event_emitter=event_emitter))

from dotenv import load_dotenv
load_dotenv()

import asyncio
from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import OpenAIDeepResearcher

query = "Create a concise report about why bananas are yellow."
event_emitter = EventEmitter.with_print_handler()

deep_researcher = OpenAIDeepResearcher()
asyncio.run(deep_researcher.research(query=query, event_emitter=event_emitter))

How it works:

You provide a research query.
You invoke deep research via the GL SDK using the research() call.
The underlying provider executes the research and streams progress and results back to your application.

📁 More examples on GitHub

2. Deep Research with Custom Prompt

This section shows how to influence how research results are presented, without changing how the research itself is executed.

Custom prompts allow you to:

adjust tone and writing style
provide domain-specific instructions
control formatting of the final output

Implementation

import asyncio
from dotenv import load_dotenv
from gllm_core.event import EventEmitter
from gllm_inference.prompt_builder import PromptBuilder
from gllm_generation.deep_researcher import OpenAIDeepResearcher

load_dotenv()

# Define custom prompt style
prompt_builder = PromptBuilder(
    system_template="Provide your deep research results as if you are a journalist writing a news article.",
    user_template="{query}",
)

event_emitter = EventEmitter.with_print_handler()
query = "Create a concise report about why bananas are yellow."

# Use custom prompt
deep_researcher = OpenAIDeepResearcher(prompt_builder=prompt_builder)
asyncio.run(deep_researcher.research(query=query, event_emitter=event_emitter))

The prompt affects how the final research output is written, but the research execution and reasoning strategy remain provider-defined.

Use cases:

Adjust tone (formal, casual, technical, etc.)
Add domain-specific instructions
Format the final output in specific styles (news article, academic paper, etc.)

📁 Complete code on GitHub

3. Deep Research with MCP Integration

This section shows how to provide additional data sources to deep research by supplying MCP tools at invocation time.

MCP integration allows deep research to access private or non-public data (such as enterprise systems or personal data sources) during execution. It does not change how deep research performs research or reasoning internally.

Note: MCP integration is currently only available with OpenAIDeepResearcher, based on provider support.

Prerequisites

The examples below assume that deep research is invoked directly using the GL SDK, with MCP tools passed in as part of the execution context.

Make sure you have:

MCP server URL or MCP connector credentials
For MCP connectors (like Google Calendar), get auth token from the provider

Implementation

import asyncio
from dotenv import load_dotenv
from gllm_core.event import EventEmitter
from gllm_inference.schema import NativeTool
from gllm_generation.deep_researcher import OpenAIDeepResearcher

load_dotenv()

# Setup MCP server or connector
mcp_server = NativeTool.mcp_server(name="...", url="https://...")
mcp_connector = NativeTool.mcp_connector(
    name="google_calendar",
    connector_id="connector_googlecalendar",
    auth="..."
)

event_emitter = EventEmitter.with_print_handler()
query = "Create a concise report about my Google Calendar events for the last 7 days!"

# Pass MCP tools to DeepResearcher
deep_researcher = OpenAIDeepResearcher(tools=[mcp_server, mcp_connector])
asyncio.run(deep_researcher.research(query=query, event_emitter=event_emitter))

MCP tools extend what data deep research can access, but the execution flow and research strategy remain defined by the underlying provider.

Benefits:

Provide access to private or non-public data sources
Integrate enterprise systems into the research context
Enable deep research to reference additional data during execution

4. Deep Research Pipeline with Routing

This section demonstrates how to place DeepResearcher Component inside a Pipeline to orchestrate when it is invoked, based on the characteristics of a user query.

Here, the Pipeline is responsible for:

inspecting the incoming request
deciding whether deep research is required
routing execution accordingly

The Pipeline does not define how deep research is executed internally. Deep research is invoked as an encapsulated step, and its internal reasoning remains provider-defined.

Setup

Clone the repository

git clone https://github.com/gl-sdk/gen-ai-sdk-cookbook.git
cd gen-ai-sdk-cookbook/deep-research

Set UV authentication and install dependencies
For Unix-based systems (Linux, macOS):
```
./setup.sh
```
For Windows:
```
setup.bat
```
Prepare .env file
```
OPENAI_API_KEY="..."
```

Implementation

In this example, DeepResearcher Component is used as one step within a Pipeline. The Pipeline handles routing logic and context preparation, while deep research itself remains a standalone invocation.

import asyncio

from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import OpenAIDeepResearcher
from gllm_generation.response_synthesizer import ResponseSynthesizer
from gllm_inference.lm_invoker.openai_lm_invoker import OpenAILMInvoker
from gllm_inference.output_parser.json_output_parser import JSONOutputParser
from gllm_inference.prompt_builder import PromptBuilder
from gllm_inference.request_processor import LMRequestProcessor
from gllm_inference.schema import LMOutput
from gllm_pipeline.router import LMBasedRouter
from gllm_pipeline.steps import step, switch
from pydantic import BaseModel


class DeepResearchState(BaseModel):
    user_query: str
    route: str | None
    result: str | LMOutput | None
    event_emitter: EventEmitter

    class Config:
        arbitrary_types_allowed = True

lmrp = LMRequestProcessor(
    prompt_builder=PromptBuilder(
        user_template="""
        Based on the following user query, determine if it is a deep research query or a normal query.

        - **normal**: Casual greetings, small talk, or simple conversational queries that do not require
          in-depth research. Examples: "hello", "how are you", "what's the weather", "thanks", "goodbye".

        - **deep_research**: Queries that require comprehensive research, multi-source analysis, or
          in-depth exploration of a topic. Examples: "research the latest AI trends", "compare X vs Y",
          "analyze the market for...", "what are the pros and cons of...".

        Output the answer in JSON format with "route" as the key. For example:
        {{"route": "deep_research"}} or {{"route": "normal"}}

        Query: {text}
        """
    ),
    lm_invoker=OpenAILMInvoker(model_name="gpt-5-nano"),
    output_parser=JSONOutputParser(),
)

router = step(
    component=LMBasedRouter(
        valid_routes={"deep_research", "normal"},
        lm_request_processor=lmrp,
        default_route="normal",
    ),
    input_map={"text": "user_query"},
    output_state="route",
)

deep_researcher = step(
    component=OpenAIDeepResearcher(model_name="o4-mini-deep-research"),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="result",
)

normal_response_synthesizer = step(
    component=ResponseSynthesizer.stuff_preset(
        model_id="openai/gpt-5-nano",
        user_template="{query}",
    ),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="result",
)

conditional_step = switch(
    condition=lambda input: input["route"],
    branches={"deep_research": deep_researcher, "normal": normal_response_synthesizer},
)

deep_research_pipeline = router | conditional_step
deep_research_pipeline.state_type = DeepResearchState


async def main() -> None:
    event_emitter = EventEmitter.with_print_handler()
    state = DeepResearchState(
        user_query="research about the latest trends in AI",
        event_emitter=event_emitter,
        route=None,
        result=None,
    )

    result = await deep_research_pipeline.invoke(state)

    print(result)


if __name__ == "__main__":
    asyncio.run(main())

Run the script

uv run 01_deep_research_pipeline.py

How it works:

The Pipeline evaluates the user query and determines the appropriate execution path.
If deep research is required, the Pipeline invokes the deep research step.
Otherwise, the Pipeline routes the request to a simpler response path.
The Pipeline returns the final result produced by the selected path.

The deep research step is treated as an encapsulated unit; the Pipeline does not break down or modify its internal execution.

📁 Complete code on GitHub

5. Deep Research Pipeline with Google Drive Integration

This section demonstrates Pipeline orchestration with additional data sources, using Google Drive as an example.

In this setup:

the Pipeline controls routing and execution flow
Google Drive access is provided via an MCP connector
Deep research is invoked as an encapsulated step with additional data available during execution

Setup

Clone the repository (if you haven't already)

git clone https://github.com/gl-sdk/gen-ai-sdk-cookbook.git
cd gen-ai-sdk-cookbook/deep-research

Set UV authentication and install dependencies
For Unix-based systems (Linux, macOS):
```
./setup.sh
```
For Windows:
```
setup.bat
```
Prepare .env file with Google Drive authentication
Get the auth token from OpenAI Connector Guide
When generating the token, make sure to enable the following scopes:
- userinfo.email
- userinfo.profile
- drive.readonly
Add to .env:
```
OPENAI_API_KEY="..."
GOOGLE_DRIVE_AUTH_TOKEN="..."
```

Implementation

In this example, Google Drive is made available to deep research through an MCP connector, while the Pipeline determines when deep research should be invoked.

import asyncio
import os

from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import OpenAIDeepResearcher
from gllm_generation.response_synthesizer import ResponseSynthesizer
from gllm_inference.lm_invoker.openai_lm_invoker import OpenAILMInvoker
from gllm_inference.output_parser.json_output_parser import JSONOutputParser
from gllm_inference.prompt_builder import PromptBuilder
from gllm_inference.request_processor import LMRequestProcessor
from gllm_inference.schema import LMOutput, NativeTool
from gllm_pipeline.router import LMBasedRouter
from gllm_pipeline.steps import step, switch
from pydantic import BaseModel


class DeepResearchState(BaseModel):
    user_query: str
    route: str | None
    result: str | LMOutput | None
    event_emitter: EventEmitter

    class Config:
        arbitrary_types_allowed = True

lmrp = LMRequestProcessor(
    prompt_builder=PromptBuilder(
        user_template="""
        Based on the following user query, determine if it is a deep research query or a normal query.

        - **normal**: Casual greetings, small talk, or simple conversational queries that do not require
          in-depth research. Examples: "hello", "how are you", "what's the weather", "thanks", "goodbye".

        - **deep_research**: Queries that require comprehensive research, multi-source analysis, or
          in-depth exploration of a topic. Examples: "research the latest AI trends", "compare X vs Y",
          "analyze the market for...", "what are the pros and cons of...".

        Output the answer in JSON format with "route" as the key. For example:
        {{"route": "deep_research"}} or {{"route": "normal"}}

        Query: {text}
        """
    ),
    lm_invoker=OpenAILMInvoker(
        model_name="gpt-5-nano",
    ),
    output_parser=JSONOutputParser(),
)

router = step(
    component=LMBasedRouter(
        valid_routes={"deep_research", "normal"},
        lm_request_processor=lmrp,
        default_route="normal",
    ),
    input_map={"text": "user_query"},
    output_state="route",
)

connector = NativeTool.mcp_connector(
    name="google_drive",
    connector_id="connector_googledrive",
    auth="<google_auth_token>",
)

deep_researcher = step(
    component=OpenAIDeepResearcher(
        model_name="o4-mini-deep-research",
        tools=[connector]
    ),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="result",
)

normal_response_synthesizer = step(
    component=ResponseSynthesizer.stuff_preset(
        model_id="openai/gpt-5-nano",
        user_template="{query}",
    ),
    input_map={"query": "user_query", "event_emitter": "event_emitter"},
    output_state="result",
)

conditional_step = switch(
    condition=lambda input: input["route"],
    branches={"deep_research": deep_researcher, "normal": normal_response_synthesizer},
)

deep_research_pipeline = router | conditional_step
deep_research_pipeline.state_type = DeepResearchState


async def main() -> None:
    event_emitter = EventEmitter.with_print_handler()
    state = DeepResearchState(
        user_query="research information from my Google Drive about AI trends",
        event_emitter=event_emitter,
        route=None,
        result=None,
    )

    result = await deep_research_pipeline.invoke(state)

    print(result)


if __name__ == "__main__":
    asyncio.run(main())

The Google Drive connector extends the data available during research, but does not change the execution flow or reasoning strategy of deep research itself.

Run the script

uv run 02_deep_research_google_drive_pipeline.py

Benefits:

Make documents stored in Google Drive available as research context
Combine private documents with public information during research
Integrate external data sources without changing research execution logic

📁 Complete code on GitHub

Last updated 1 month ago

hashtagOverview

hashtag1. Deep Research Hello World

hashtagInstallation

hashtagImplementation

hashtag2. Deep Research with Custom Prompt

hashtagImplementation

hashtag3. Deep Research with MCP Integration

hashtagPrerequisites

hashtagImplementation

hashtag4. Deep Research Pipeline with Routing

hashtagSetup

hashtagImplementation

hashtagRun the script

hashtag5. Deep Research Pipeline with Google Drive Integration

hashtagSetup

hashtagImplementation

hashtagRun the script

Overview

1. Deep Research Hello World

Installation

Implementation

2. Deep Research with Custom Prompt

Implementation

3. Deep Research with MCP Integration

Prerequisites

Implementation

4. Deep Research Pipeline with Routing

Setup

Implementation

Run the script

5. Deep Research Pipeline with Google Drive Integration

Setup

Implementation

Run the script