This guide shows how to perform deep research using the Deep Researchercomponentstarting from simple, direct usage and gradually moving toward more advanced orchestration patterns.
Additionally, Component can be placed inside a Pipeline when you need additional logic such as context preparation or routing decisions. The Pipeline orchestrates when and under what conditions deep research is invoked, but does not define how deep research itself works internally.
Prerequisites
This example specifically requires:
completion of all setup steps listed on the Prerequisites page.
API key for deep researcher component set. Please refer to
# you can use a Conda environmentpipinstall--extra-index-url"https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/"python-dotenvgllm-coregllm-generationgllm-inferencegllm-pipeline
# you can use a Conda environmentpip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" python-dotenv gllm-core gllm-generation gllm-inference gllm-pipeline
See Quickstart for the simplest way to perform deep research using the GL SDK, which is by invoking deep research directly without any Pipeline orchestration.
Integrate in a Research Pipeline with Routing
This section demonstrates how to place DeepResearcher Component inside a Pipeline to orchestrate when it is invoked, based on the characteristics of a user query.
Here, the Pipeline is responsible for:
inspecting the incoming request
deciding whether deep research is required
routing execution accordingly
The Pipeline does not define how deep research is executed internally. Deep research is invoked as an encapsulated step, and its internal reasoning remains provider-defined.
In this example, DeepResearcher Component is used as one step within a Pipeline. The Pipeline handles routing logic and context preparation, while deep research itself remains a standalone invocation.
Run the script
How it works:
The Pipeline evaluates the user query and determines the appropriate execution path.
If deep research is required, the Pipeline invokes the deep research step.
Otherwise, the Pipeline routes the request to a simpler response path.
The Pipeline returns the final result produced by the selected path.
The deep research step is treated as an encapsulated unit; the Pipeline does not break down or modify its internal execution.
Deep Research Pipeline with Google Drive Integration
This section demonstrates Pipeline orchestration with additional data sources, using Google Drive as an example.
In this setup:
the Pipeline controls routing and execution flow
Google Drive access is provided via an MCP connector
Deep research is invoked as an encapsulated step with additional data available during execution
When generating the token, make sure to enable the following scopes:
userinfo.email
userinfo.profile
drive.readonly
Add to .env:
Implementation
In this example, Google Drive is made available to deep research through an MCP connector, while the Pipeline determines when deep research should be invoked.
The Google Drive connector extends the data available during research, but does not change the execution flow or reasoning strategy of deep research itself.
Run the script
Benefits:
Make documents stored in Google Drive available as research context
Combine private documents with public information during research
Integrate external data sources without changing research execution logic
git clone https://github.com/gl-sdk/gen-ai-sdk-cookbook.git
cd gen-ai-sdk-cookbook/deep-research
./setup.sh
setup.bat
OPENAI_API_KEY="..."
import asyncio
from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import OpenAIDeepResearcher
from gllm_generation.response_synthesizer import ResponseSynthesizer
from gllm_inference.lm_invoker.openai_lm_invoker import OpenAILMInvoker
from gllm_inference.output_parser.json_output_parser import JSONOutputParser
from gllm_inference.prompt_builder import PromptBuilder
from gllm_inference.request_processor import LMRequestProcessor
from gllm_inference.schema import LMOutput
from gllm_pipeline.router import LMBasedRouter
from gllm_pipeline.steps import step, switch
from pydantic import BaseModel
class DeepResearchState(BaseModel):
user_query: str
route: str | None
result: str | LMOutput | None
event_emitter: EventEmitter
class Config:
arbitrary_types_allowed = True
lmrp = LMRequestProcessor(
prompt_builder=PromptBuilder(
user_template="""
Based on the following user query, determine if it is a deep research query or a normal query.
- **normal**: Casual greetings, small talk, or simple conversational queries that do not require
in-depth research. Examples: "hello", "how are you", "what's the weather", "thanks", "goodbye".
- **deep_research**: Queries that require comprehensive research, multi-source analysis, or
in-depth exploration of a topic. Examples: "research the latest AI trends", "compare X vs Y",
"analyze the market for...", "what are the pros and cons of...".
Output the answer in JSON format with "route" as the key. For example:
{{"route": "deep_research"}} or {{"route": "normal"}}
Query: {text}
"""
),
lm_invoker=OpenAILMInvoker(model_name="gpt-5-nano"),
output_parser=JSONOutputParser(),
)
router = step(
component=LMBasedRouter(
valid_routes={"deep_research", "normal"},
lm_request_processor=lmrp,
default_route="normal",
),
input_map={"text": "user_query"},
output_state="route",
)
deep_researcher = step(
component=OpenAIDeepResearcher(model_name="o4-mini-deep-research"),
input_map={"query": "user_query", "event_emitter": "event_emitter"},
output_state="result",
)
normal_response_synthesizer = step(
component=ResponseSynthesizer.stuff_preset(
model_id="openai/gpt-5-nano",
user_template="{query}",
),
input_map={"query": "user_query", "event_emitter": "event_emitter"},
output_state="result",
)
conditional_step = switch(
condition=lambda input: input["route"],
branches={"deep_research": deep_researcher, "normal": normal_response_synthesizer},
)
deep_research_pipeline = router | conditional_step
deep_research_pipeline.state_type = DeepResearchState
async def main() -> None:
event_emitter = EventEmitter.with_print_handler()
state = DeepResearchState(
user_query="research about the latest trends in AI",
event_emitter=event_emitter,
route=None,
result=None,
)
result = await deep_research_pipeline.invoke(state)
print(result)
if __name__ == "__main__":
asyncio.run(main())
uv run 01_deep_research_pipeline.py
git clone https://github.com/gl-sdk/gen-ai-sdk-cookbook.git
cd gen-ai-sdk-cookbook/deep-research
import asyncio
import os
from gllm_core.event import EventEmitter
from gllm_generation.deep_researcher import OpenAIDeepResearcher
from gllm_generation.response_synthesizer import ResponseSynthesizer
from gllm_inference.lm_invoker.openai_lm_invoker import OpenAILMInvoker
from gllm_inference.output_parser.json_output_parser import JSONOutputParser
from gllm_inference.prompt_builder import PromptBuilder
from gllm_inference.request_processor import LMRequestProcessor
from gllm_inference.schema import LMOutput, NativeTool
from gllm_pipeline.router import LMBasedRouter
from gllm_pipeline.steps import step, switch
from pydantic import BaseModel
class DeepResearchState(BaseModel):
user_query: str
route: str | None
result: str | LMOutput | None
event_emitter: EventEmitter
class Config:
arbitrary_types_allowed = True
lmrp = LMRequestProcessor(
prompt_builder=PromptBuilder(
user_template="""
Based on the following user query, determine if it is a deep research query or a normal query.
- **normal**: Casual greetings, small talk, or simple conversational queries that do not require
in-depth research. Examples: "hello", "how are you", "what's the weather", "thanks", "goodbye".
- **deep_research**: Queries that require comprehensive research, multi-source analysis, or
in-depth exploration of a topic. Examples: "research the latest AI trends", "compare X vs Y",
"analyze the market for...", "what are the pros and cons of...".
Output the answer in JSON format with "route" as the key. For example:
{{"route": "deep_research"}} or {{"route": "normal"}}
Query: {text}
"""
),
lm_invoker=OpenAILMInvoker(
model_name="gpt-5-nano",
),
output_parser=JSONOutputParser(),
)
router = step(
component=LMBasedRouter(
valid_routes={"deep_research", "normal"},
lm_request_processor=lmrp,
default_route="normal",
),
input_map={"text": "user_query"},
output_state="route",
)
connector = NativeTool.mcp_connector(
name="google_drive",
connector_id="connector_googledrive",
auth="<google_auth_token>",
)
deep_researcher = step(
component=OpenAIDeepResearcher(
model_name="o4-mini-deep-research",
tools=[connector]
),
input_map={"query": "user_query", "event_emitter": "event_emitter"},
output_state="result",
)
normal_response_synthesizer = step(
component=ResponseSynthesizer.stuff_preset(
model_id="openai/gpt-5-nano",
user_template="{query}",
),
input_map={"query": "user_query", "event_emitter": "event_emitter"},
output_state="result",
)
conditional_step = switch(
condition=lambda input: input["route"],
branches={"deep_research": deep_researcher, "normal": normal_response_synthesizer},
)
deep_research_pipeline = router | conditional_step
deep_research_pipeline.state_type = DeepResearchState
async def main() -> None:
event_emitter = EventEmitter.with_print_handler()
state = DeepResearchState(
user_query="research information from my Google Drive about AI trends",
event_emitter=event_emitter,
route=None,
result=None,
)
result = await deep_research_pipeline.invoke(state)
print(result)
if __name__ == "__main__":
asyncio.run(main())