How to Debug Errors
This section focuses on identifying and resolving functional problems like application crashes, configuration errors, and runtime exceptions.
Chapter 1: Locating GLChat Logs
Before diving into debugging techniques, it's important to know where to find the logs. All logs discussed in this guide are generated by the glchat-be service.
Depending on your environment setup, these logs can be found in a few common places:
Standard Output: When running the service locally in a terminal, logs will typically be printed directly to your console.
Log Aggregation Platform (e.g., Kibana): If your organization uses a centralized logging solution, the logs from the
glchat-beservice will be streamed there. You can use the platform's search and filtering capabilities to isolate the logs for this service.
Chapter 2: The Three Methods of Debugging
When an error occurs, your investigation will start in your monitoring platform, like Sentry or Kibana. The first and most important piece of information is often the error message itself, which can be enough to solve the problem directly. If more context is needed, you will then inspect the logs. GLChat provides different levels of logging detail to help your investigation. This chapter describes the key sources of information you will use, from the initial alert to a deep dive into the application's state.
Method 1: Using Sentry (The Alert)
If you have Sentry integration configured, it will likely be your first line of defense. It captures and aggregates errors from your live environment, providing a user-friendly interface to view the exception and the traceback. This is often the fastest way to see what went wrong and where in the code the error occurred.
What Sentry provides: The main value of Sentry is the immediate alert and the captured exception, including a full traceback. This is crucial for initial diagnosis.
What might be missing: While you get the error, you may not get the full context. For example, the detailed input that was sent to the failing step might not be fully visible due to log size limitations. Sentry tells you the "what" and "where," but you often need other methods to find the "why."
Example of the Sentry UI:

Method 2: Inspecting the Standard Logs (The First Look)
If the error message from Sentry is not specific enough to solve the issue directly, your next step is to inspect the logs. The standard logs are the default output from the glchat-be service when the verbose DEBUG_STATE mode is not enabled. They provide a high-level view of the pipeline's execution and the context surrounding an error.
What to look for: In these logs, you should look for two main things:
The
ERRORmessage itself, along with the full Python traceback.The
INFOandDEBUGlogs leading up to the error. These show the sequence of steps that were executed and can help you understand the flow of the application right before it failed.
Purpose: The goal is to get a clearer picture of what the application was doing when it crashed. Sometimes, seeing which step ran last or what input it received is enough to diagnose the problem.
Example of Standard Logs:
Below is an example of what the standard logs look like during a pipeline run. You can see the start and finish of each component, along with some of the key inputs they are processing.
How to Read a Traceback
When an error occurs, the log will include a traceback, which can look long and intimidating. The key is to read it from the bottom up. The last line usually contains the most specific error message, which is the root cause of the problem.
Let's look at a condensed version of a real traceback:
In this example, even though the final error is a generic RuntimeError, by reading up from the bottom, you can find the original ValueError. This tells you the exact problem: the VectorRetriever was called without a query. This is the crucial clue you need to fix the bug.
This method is best for getting initial context when the error message alone isn't enough. However, standard logs often do not contain the full application state, which might be necessary for more complex bugs.
Method 3: The Verbose State Log (DEBUG_STATE) (The Deep Dive)
DEBUG_STATE) (The Deep Dive)For complex issues where the standard logs aren't enough, you need a deeper look into the application's state. This is GLChat's most powerful debugging feature.
How to find the logs: In your log output, the verbose state trace for a specific conversation is contained within a block that starts with
[TRACE] Conversation ID: <your_conversation_id>and ends with[/TRACE]. You can search for this string to find the relevant logs. Note that there will typically be three of these trace blocks for each request, corresponding to the preprocessing, main pipeline, and postprocessing stages.How to enable it: Set the environment variable
DEBUG_STATEtotrue.What it does: When this mode is active, a highly detailed trace of the pipeline's execution is logged. This includes the full state object before and after every single step. This allows you to see exactly how the state changes over time and to inspect every piece of data a step uses.
Example of a Verbose State Log:
The key to using this mode is to find the task and task_result entries for a specific step to see what changed. Below is a condensed snippet from a real log, showing the state before a step (task payload) and the changes it produced (task_result payload).
Note on Logs: For a complete example of a verbose DEBUG_STATE log, see the full file here:
In this example, the set_use_case step took the user_query as input and, as shown in the task_result, it correctly determined that the use_case_id should be answer_question. The next step in the pipeline will now have access to this new value in the state. This is the fundamental technique for tracing the flow of data.
This is the most effective method for finding the root cause of difficult errors, as it gives you a complete picture of what's happening inside the pipeline.
Chapter 3: A Step-by-Step Workflow for Error Debugging
When you encounter an error, follow this workflow to resolve it efficiently.
Start in Sentry or Kibana.
Your investigation will begin in your log aggregation platform. Find the error and review the traceback to get an initial understanding of the problem.
Analyze the error message. Before anything else, read the error message carefully. Sometimes, the error is specific enough to tell you exactly what is wrong, making local reproduction unnecessary. For example, an error like this is very clear:
This message explicitly tells you that the retriever step was called without a
query. If the fix is obvious from the message, you can proceed to implement it directly.Check log verbosity if the error is unclear. If the error message is more generic, look through the logs associated with the error. If
DEBUG_STATEwas enabled in that environment, you may see the full "State before..." and "State after..." logs. If this information is present and not truncated, you might be able to diagnose the problem from here.
Reproduce Locally (If Necessary).
You only need to reproduce the error locally if the error message is not specific and the logs in Sentry/Kibana are not detailed enough (i.e.,
DEBUG_STATEwas not enabled or the logs were truncated).Run the application in your local environment and take the necessary steps to trigger the same error.
Enable
DEBUG_STATEfor Local Inspection.Once you can reliably reproduce the error locally, stop the application.
Set the
DEBUG_STATEenvironment variable:export DEBUG_STATE=true.Run the pipeline again. Now your local logs will contain the highly detailed state information needed for a deep dive.
Inspect the Verbose Logs for the Root Cause.
Open your local logs (or the logs in Kibana if they were complete) and find the step where the error occurred.
Look at the log entry labeled "State before ". This shows you the exact state that was passed into the failing step.
Analyze the state to find the root cause: is data missing, in the wrong format, or an unexpected value?
Last updated