How to Debug Errors

This section focuses on identifying and resolving functional problems like application crashes, configuration errors, and runtime exceptions.

Chapter 1: Locating GLChat Logs

Before diving into debugging techniques, it's important to know where to find the logs. All logs discussed in this guide are generated by the glchat-be service.

Depending on your environment setup, these logs can be found in a few common places:

Standard Output: When running the service locally in a terminal, logs will typically be printed directly to your console.
Log Aggregation Platform (e.g., Kibana): If your organization uses a centralized logging solution, the logs from the glchat-be service will be streamed there. You can use the platform's search and filtering capabilities to isolate the logs for this service.

Chapter 2: The Three Methods of Debugging

When an error occurs, your investigation will start in your monitoring platform, like Sentry or Kibana. The first and most important piece of information is often the error message itself, which can be enough to solve the problem directly. If more context is needed, you will then inspect the logs. GLChat provides different levels of logging detail to help your investigation. This chapter describes the key sources of information you will use, from the initial alert to a deep dive into the application's state.

Method 1: Using Sentry (The Alert)

If you have Sentry integration configured, it will likely be your first line of defense. It captures and aggregates errors from your live environment, providing a user-friendly interface to view the exception and the traceback. This is often the fastest way to see what went wrong and where in the code the error occurred.

What Sentry provides: The main value of Sentry is the immediate alert and the captured exception, including a full traceback. This is crucial for initial diagnosis.
What might be missing: While you get the error, you may not get the full context. For example, the detailed input that was sent to the failing step might not be fully visible due to log size limitations. Sentry tells you the "what" and "where," but you often need other methods to find the "why."

Example of the Sentry UI:

Method 2: Inspecting the Standard Logs (The First Look)

If the error message from Sentry is not specific enough to solve the issue directly, your next step is to inspect the logs. The standard logs are the default output from the glchat-be service when the verbose DEBUG_STATE mode is not enabled. They provide a high-level view of the pipeline's execution and the context surrounding an error.

What to look for: In these logs, you should look for two main things:
- The ERROR message itself, along with the full Python traceback.
- The INFO and DEBUG logs leading up to the error. These show the sequence of steps that were executed and can help you understand the flow of the application right before it failed.
Purpose: The goal is to get a clearer picture of what the application was doing when it crashed. Sometimes, seeing which step ran last or what input it received is enough to diagnose the problem.

Example of Standard Logs:

Below is an example of what the standard logs look like during a pipeline run. You can see the start and finish of each component, along with some of the key inputs they are processing.

2025-11-19T15:12:32 DEBUG     [Start 'ExtraContentProcessor'] Processing input:                                                                                                                                                             component.py:131
                                 - extra_contents: []
                                 - retrieval_params: {'filters': [{'bool': {'should': [{'bool': {'must_not': [{'exists': {'field': 'metadata.conversation_id'}}]}}, {'term': {'metadata.conversation_id.keyword':
                             '64320258-09f9-4749-84df-4aa75ce19d38'}}], 'minimum_should_match': 1}}]}
                                 - processed_urls: []
                                 - new_anonymized_mappings: []
                                 - attachments: {'attachments': []}
                                 - conversation_id: '64320258-09f9-4749-84df-4aa75ce19d38'
                                 - model_supported_attachments: {'image': ['.jpg', '.jpeg', '.png', '.gif', '.webp']}
                                 - anonymize_lm: False
                                 - use_docproc: True
                                 - event_emitter: <gllm_core.event.event_emitter.EventEmitter object at 0x77d957e41950>
2025-11-19T15:12:32 DEBUG     [Finished 'ExtraContentProcessor'] Successfully processed extra contents:                                                                                                                                     component.py:131
2025-11-19T15:12:32 DEBUG     [Start 'HistoryProcessor'] Processing input:                                                                                                                                                                  component.py:131
                                 - anonymized_mappings: []
                                 - history: []
                                 - processed_urls: []
                                 - retrieval_params: {'filters': [{'bool': {'should': [{'bool': {'must_not': [{'exists': {'field': 'metadata.conversation_id'}}]}}, {'term': {'metadata.conversation_id.keyword':
                             '64320258-09f9-4749-84df-4aa75ce19d38'}}], 'minimum_should_match': 1}}]}
                                 - anonymize_lm: False
                                 - conversation_id: '64320258-09f9-4749-84df-4aa75ce19d38'
                                 - model_supported_attachments: {'image': ['.jpg', '.jpeg', '.png', '.gif', '.webp']}
                                 - use_docproc: True
                                 - model_name: 'openai/gpt-4o-mini'
                                 - event_emitter: <gllm_core.event.event_emitter.EventEmitter object at 0x77d957e41950>
2025-11-19T15:12:32 DEBUG     [Finished 'HistoryProcessor'] Successfully processed history:                                                                                                                                                 component.py:131
2025-11-19T15:12:32 DEBUG    [VectorCachingManager] [Start 'VectorCachingManager'] Processing input:                                                                                                                                        component.py:131
                                 - query: 'hello'
                                 - event_emitter: <gllm_core.event.event_emitter.EventEmitter object at 0x77d957e41950>
                                 - model_name: 'openai/gpt-4.1-mini'
                                 - attachments: {'attachments': []}
                                 - binaries: []
                                 - client_type: <ClientType.DEFAULT: 'default'>
                                 - knowledge_base_id: 'general-purpose'
                                 - use_model_knowledge: True
                                 - mode: 'check'
                                 - cache_matching_strategy: <MatchingStrategy.SEMANTIC: 'semantic'>
2025-11-19T15:12:32 INFO      HEAD http://127.0.0.1:9200/gllm-cache                                                                                                                                                                        _transport.py:349
2025-11-19T15:12:32 INFO     [OpenAIEMInvoker] Invoking 'OpenAIEMInvoker'                                                                                                                                                                  em_invoker.py:125
2025-11-19T15:12:33 INFO      POST http://127.0.0.1:9200/gllm-cache/_search?_source_includes=metadata,text                                                                                                                           _async_transport.py:271
2025-11-19T15:12:33 DEBUG    [VectorCachingManager] [Finished 'VectorCachingManager'] Successfully produced output:                                                                                                                         component.py:131
                             ('', [], {}, '', False)

How to Read a Traceback

When an error occurs, the log will include a traceback, which can look long and intimidating. The key is to read it from the bottom up. The last line usually contains the most specific error message, which is the root cause of the problem.

Let's look at a condensed version of a real traceback:

2025-11-19T16:20:13 ERROR     Task exception was never retrieved
  ... (error details) ...
  Traceback (most recent call last):
    ... (many lines of framework and pipeline code) ...
  File "/home/user/lib/python3.11/site-packages/gllm_retrieval/retriever/vector_retriever/vector_retriever.py", line 67, in _run
    raise ValueError("The input kwargs for retriever must include a non-empty `query` key of type `str`.")
ValueError: The input kwargs for retriever must include a non-empty `query` key of type `str`.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  ... (more lines) ...
RuntimeError: Error executing pipeline: Error in SubgraphStep 'retrieval_step' during subgraph execution. ...

In this example, even though the final error is a generic RuntimeError, by reading up from the bottom, you can find the original ValueError. This tells you the exact problem: the VectorRetriever was called without a query. This is the crucial clue you need to fix the bug.

This method is best for getting initial context when the error message alone isn't enough. However, standard logs often do not contain the full application state, which might be necessary for more complex bugs.

Method 3: The Verbose State Log (`DEBUG_STATE`) (The Deep Dive)

For complex issues where the standard logs aren't enough, you need a deeper look into the application's state. This is GLChat's most powerful debugging feature.

How to find the logs: In your log output, the verbose state trace for a specific conversation is contained within a block that starts with [TRACE] Conversation ID: <your_conversation_id> and ends with [/TRACE]. You can search for this string to find the relevant logs. Note that there will typically be three of these trace blocks for each request, corresponding to the preprocessing, main pipeline, and postprocessing stages.
How to enable it: Set the environment variable DEBUG_STATE to true.
```
export DEBUG_STATE=true
```
What it does: When this mode is active, a highly detailed trace of the pipeline's execution is logged. This includes the full state object before and after every single step. This allows you to see exactly how the state changes over time and to inspect every piece of data a step uses.

Example of a Verbose State Log:

The key to using this mode is to find the task and task_result entries for a specific step to see what changed. Below is a condensed snippet from a real log, showing the state before a step (task payload) and the changes it produced (task_result payload).

Note on Logs: For a complete example of a verbose DEBUG_STATE log, see the full file here:

194KB

example-verbose-log.txt

Open

{
  "step": 1,
  "timestamp": "2025-11-21T02:34:03.990943+00:00",
  "type": "task",
  "payload": {
    "name": "set_use_case__effccz",
    "input": {
      "user_query": "what is 5 - 6",
      "standalone_query": "what is 5 - 6",
      "use_case_id": "", // <-- Note that 'use_case_id' is currently empty
      "...": "..."
    }
  }
},
{
  "step": 1,
  "timestamp": "2025-11-21T02:34:03.991324+00:00",
  "type": "task_result",
  "payload": {
    "name": "set_use_case__effccz",
    "result": {
      "use_case_id": "answer_question" // <-- This is the output of the step
    }
  }
},

In this example, the set_use_case step took the user_query as input and, as shown in the task_result, it correctly determined that the use_case_id should be answer_question. The next step in the pipeline will now have access to this new value in the state. This is the fundamental technique for tracing the flow of data.

This is the most effective method for finding the root cause of difficult errors, as it gives you a complete picture of what's happening inside the pipeline.

Chapter 3: A Step-by-Step Workflow for Error Debugging

When you encounter an error, follow this workflow to resolve it efficiently.

Start in Sentry or Kibana.
- Your investigation will begin in your log aggregation platform. Find the error and review the traceback to get an initial understanding of the problem.
- Analyze the error message. Before anything else, read the error message carefully. Sometimes, the error is specific enough to tell you exactly what is wrong, making local reproduction unnecessary. For example, an error like this is very clear:
  { "data_type": "error", "data_value": { "error_code": "UNKERR", "error_message": "... ValueError. Original error: The input kwargs for retriever must include a non-empty `query` key of type `str`.", "message_id": "a252fcef-8319-48ce-8190-28943c6f9c7e", "is_retryable": false } }
  This message explicitly tells you that the retriever step was called without a query. If the fix is obvious from the message, you can proceed to implement it directly.
- Check log verbosity if the error is unclear. If the error message is more generic, look through the logs associated with the error. If DEBUG_STATE was enabled in that environment, you may see the full "State before..." and "State after..." logs. If this information is present and not truncated, you might be able to diagnose the problem from here.
Reproduce Locally (If Necessary).
- You only need to reproduce the error locally if the error message is not specific and the logs in Sentry/Kibana are not detailed enough (i.e., DEBUG_STATE was not enabled or the logs were truncated).
- Run the application in your local environment and take the necessary steps to trigger the same error.
Enable DEBUG_STATE for Local Inspection.
- Once you can reliably reproduce the error locally, stop the application.
- Set the DEBUG_STATE environment variable: export DEBUG_STATE=true.
- Run the pipeline again. Now your local logs will contain the highly detailed state information needed for a deep dive.
Inspect the Verbose Logs for the Root Cause.
- Open your local logs (or the logs in Kibana if they were complete) and find the step where the error occurred.
- Look at the log entry labeled "State before ". This shows you the exact state that was passed into the failing step.
- Analyze the state to find the root cause: is data missing, in the wrong format, or an unexpected value?

PreviousDebugging NextHow to Debug Accuracy Issues

Last updated 1 month ago

hashtagChapter 1: Locating GLChat Logs

hashtagChapter 2: The Three Methods of Debugging

hashtagMethod 1: Using Sentry (The Alert)

hashtagMethod 2: Inspecting the Standard Logs (The First Look)

hashtagMethod 3: The Verbose State Log (DEBUG_STATE) (The Deep Dive)

hashtagChapter 3: A Step-by-Step Workflow for Error Debugging