lightbulbCore design principles

Audience: Developers

Core GL Open DeepResearch Design Principles

This section describes the core design principles of GL Open DeepResearch: pipelines, modularity, composability, and the patterns that support them. For a list of core components and their roles, see Core components. Understanding these principles helps when extending the system, adding providers or tools, and reasoning about request and event flow.


Pipelines

The application is structured around clear pipelines: well-defined flows from input to output, with consistent stages and boundaries. Pipelines make behavior predictable and make it easier to add observability, error handling, and new capabilities at specific points.

Task and taskgroup pipeline

Clients create research via the task or taskgroup API. The orchestration path from API to provider runs inside the worker:

  1. Router — Validates the request and authenticates the client (API key).

  2. Factory — Resolves the profile, creates the adapter for the profile’s provider, and optionally builds the tool list from the profile. Produces a request-scoped orchestrator (one adapter + one profile per request).

  3. Orchestrator — Holds the adapter and profile, handles timing and errors, and calls the adapter’s run().

  4. Adapter — Executes the underlying research engine (e.g. Tongyi, GPT-Researcher) with the given query, profile, tools, and optional event emitter.

Clients use the task or taskgroup API; the same orchestration path runs inside the worker (see Asynchronous task pipeline below).

spinner

Asynchronous task pipeline

For async execution, the pipeline extends to a background worker and storage:

  1. Router — Accepts task creation; TaskService validates input, persists the task, and enqueues a Celery task.

  2. Celery worker — Picks up the task and calls TaskService.execute_research with the task ID, query, and profile.

  3. TaskService — Updates status, sets up streaming (event capture to Redis), creates an orchestrator via the same Factory, runs conduct_research, then updates status, stores the completion event, and runs webhooks.

So the same “Router → Factory → Orchestrator → Adapter” pipeline runs inside the worker, with the addition of task lifecycle, Redis-backed streaming, and webhooks.

Streaming event pipeline

Streaming is implemented as a capture–store–retrieve pipeline:

  1. Capture — During research execution, a StreamEventHandler and EventEmitter capture adapter events; a postprocessor (from the adapter) can transform them.

  2. Store — Events are appended to Redis lists (e.g. task_stream:{task_id}, taskgroup_stream:{taskgroup_id}) with a TTL.

  3. Retrieve — Clients consume events via SSE. The streaming handler exposes a generic get_redis_stream(redis_key, ...) that polls a Redis list and yields SSE-formatted events; get_task_stream and group stream logic use this same primitive.

Task-level and taskgroup-level streams share the same storage and retrieval pattern, so behavior and extensions (e.g. completion detection, timeouts) stay consistent.

Research flow inside adapters

Within an adapter, the research engine often implements its own pipeline. For example, Tongyi Deep Research:

  1. Decomposes the question into sub-problems.

  2. Iterates over rounds: reason → act (tools) → observe.

  3. Uses tools (e.g. web search, page fetch) to gather information.

  4. Synthesizes and returns an evidence-backed answer.

The orchestrator does not dictate these steps; it only invokes adapter.run(). Pipeline design inside each adapter is provider-specific.


Modularity

The codebase is split into modules with clear responsibilities and stable boundaries. This keeps changes local and makes testing and replacement of parts easier.

Layered structure

  • Router layer — HTTP, validation, authentication; no business logic.

  • Service layer — Task, TaskGroup, Profile, Account: orchestration of use cases and delegation to repositories and the orchestrator.

  • Orchestrator layer — Single responsibility: run the right adapter with the right profile and tools.

  • Adapter layer — Provider-specific implementations behind a common protocol.

  • Repository layer — Data access; services depend on abstractions (e.g. base repository interfaces), not concrete DB implementations.

  • Infrastructure — Redis, Celery, streaming handler; used by services but not tied to a single domain.

Routers depend on services; services depend on orchestrator, repositories, and streaming/cache; the orchestrator depends only on the adapter protocol and domain models.

Protocol-based adapters

Adapters are not tied by inheritance but by structural typing: they implement the OrchestratorAdapter protocol (name, provider_type, description, streaming_postprocessor, run). So:

  • New providers can be added without changing orchestrator or router code.

  • The orchestrator stays provider-agnostic and only calls the protocol methods.

Registries as extension points

  • OrchestratorRegistry — Registers adapter factories by provider key; the factory creates an adapter instance (e.g. from a class path). Used at startup and by the Factory when creating the orchestrator.

  • ToolRegistry — Registers tool factories by name; the factory creates a tool instance. Profiles refer to tools by name; the Factory uses create_many(tool_names) to build the list passed to the orchestrator.

Adding a provider = register an adapter. Adding a tool = register a tool factory. No change to the core request or task pipeline.

Streaming and task execution

  • TaskStreamingHandler — Encapsulates “capture events and store in Redis” and “read from a Redis key and stream as SSE.” TaskService and TaskGroupService use it but do not implement Redis or SSE details.

  • get_redis_stream — Generic stream reader for any Redis list key; task stream and taskgroup stream both use it with different keys and stop conditions (e.g. completion event, “all tasks done”).

This keeps streaming logic in one place and reusable for tasks and taskgroups.


Composability

The system is designed so that small, well-defined pieces are combined to form requests, tasks, and streams. Composability is visible in profiles, the factory, taskgroups, and tools.

Profile as composition unit

A Profile combines:

  • provider — Which adapter to use (e.g. Tongyi, GPTR).

  • params — Provider- and use-case-specific options (e.g. llm_model, max_depth, tools).

The same orchestrator and factory work with any profile; the profile determines adapter and tools. New behavior can be added by new profiles or new params without changing the core pipeline.

Factory as composer

OrchestratorFactory.create(request) composes the runtime for a single request:

  1. Resolves Profile from the request (by name or default).

  2. Creates the Adapter via OrchestratorRegistry from the profile’s provider.

  3. Optionally creates Tools via ToolRegistry from profile.params (e.g. params["tools"]).

  4. Builds one DeepResearchOrchestrator(adapter, profile, tools).

So: one request → one profile → one adapter + one optional tool list → one orchestrator. The factory is the only place that ties profile, adapter, and tools together for a request.

Taskgroups as composition of tasks

A TaskGroup is a batch of research tasks that share configuration:

  • Shared: profile, webhook (and thus provider and tools).

  • Per task: query (from the group’s query list).

TaskGroupService creates one task per query via TaskService, associating each task with the same taskgroup. Group status is derived from member task statuses (e.g. SUCCESS when all succeed, PARTIAL_FAILURE when some fail). So “taskgroup” is a composition of many tasks plus shared config and derived status, not a new kind of execution engine.

Group stream as composition of streams

get_group_stream composes multiple streams into one SSE response:

  • One asyncio task per task stream (each using the same get_redis_stream-style retrieval from task_stream:{task_id}).

  • One stream from taskgroup_stream:{taskgroup_id} for group-level events (e.g. status changes).

  • A shared queue and a completion signal (e.g. when all task streams are done) so the taskgroup stream can stop and the client receives a single, ordered stream.

So the group stream is “all task streams + taskgroup stream,” composed with a clear stop condition and ordering.

Tools as composed capabilities

Tools are registered by name; a profile’s params["tools"] is a list of names. The Factory calls ToolRegistry.create_many(tool_names) and passes the list to the orchestrator, which passes it to the adapter. The adapter (e.g. Tongyi) uses the tools during research. So:

  • Composition: A profile composes a set of tools by name.

  • Reuse: The same tool can appear in many profiles.

  • Extensibility: New tools are registered and then referenced in profiles; no change to the factory or orchestrator logic.


These patterns support the pipelines, modularity, and composability described above.

Pattern
Where it appears
Purpose

Factory

OrchestratorFactory

Build request-scoped orchestrator (profile + adapter + tools); isolates creation and keeps routers simple.

Registry

OrchestratorRegistry, ToolRegistry, ProfileRegistry

Pluggable providers/tools/profiles; add new ones without changing callers.

Protocol (structural typing)

OrchestratorAdapter

Loose coupling to adapters; any type that implements the protocol can be used.

Repository

Profile, Task, TaskGroup, Account repositories

Abstract data access; services depend on interfaces, not DB or Redis.

Dependency injection

FastAPI Depends()

Services and handlers receive dependencies (e.g. TaskService, streaming handler) from the container.

Single pipeline per request

Router → Factory → Orchestrator → Adapter

One clear path per research run; same path for stream, task, and taskgroup.

Generic stream primitive

get_redis_stream(redis_key, ...)

One implementation for “read list from Redis and stream as SSE”; reused for task and taskgroup.

Event-driven streaming

Capture → Redis list → get_redis_stream → SSE

Decouples producer (worker) from consumer (API); supports multiple clients and retries.


Summary

  • Pipelines: Request, task, and streaming flows are well-defined and staged; the same “orchestrator + adapter” pipeline runs for streaming and async tasks.

  • Modularity: Layers, protocol-based adapters, and registries keep responsibilities clear and extensions local (new adapters, tools, profiles).

  • Composability: Profiles combine provider and params; the factory composes profile, adapter, and tools per request; taskgroups compose tasks; group stream composes task and taskgroup streams; tools are composed by name in profiles.

Together, these principles keep the system predictable, testable, and easy to extend with new providers, tools, and execution modes (e.g. stream, task, taskgroup) without duplicating core logic.

Last updated