GLChat Pipeline
This section explains how GLChat processes a user message end-to-end — from the moment it arrives to the moment a response is streamed back
Overview
When a user sends a message, it does not go directly to an AI model. It passes through a series of stages that handle input processing, safety, context enrichment, execution, and persistence. Understanding this flow helps you know exactly where things happen and why.
Full Execution Flow
Stage Summary
Guardrails
Guardrails runs a safety check on the raw user input. If the input violates configured policies, a blocked response is returned immediately. If guardrails are not enabled or the input is safe, the message continues to preprocessing.
Preprocessing
Preprocessing enriches the raw message with full conversation context — rewriting it as a standalone query, retrieving long-term memories, and checking the response cache. If a cache hit is found, the cached response is returned immediately without running the pipeline.
Routing
After preprocessing, the system revisits the routing decision made at the very start. It now applies it: the message is directed to a Pipeline (e.g. Standard RAG) or an Agent, depending on the user configuration.
Pipeline or Agent
Pipeline — a structured, deterministic flow that retrieves context and generates a response. Standard RAG is the primary pipeline.
Agent — a flexible, multi-step executor that can call tools, reason, and loop before producing a final answer.
Both paths produce a response that is streamed to the user in real time.
Postprocessing
After the pipeline or agent finishes, the system saves the interaction — chat history and memory — asynchronously so it does not delay the response stream.
Last updated