ELK
At GDP Labs, we leverage the ELK stack (Elasticsearch, Logstash, and Kibana) within Elastic Cloud as our main observability platform. This cloud-based solution offers centralized log management and monitoring for our infrastructure.
Glossary
Data View: A grouping or index of logs in Kibana.
KQL/Lucene: Query languages for flexible log searching.
Logstash: Middleware for log ingestion in the ELK stack.
Bar Chart: Visualizes log volume over time for easier analysis.
Prerequisites
Ensure you have access to Elastic Cloud and the relevant Kibana data views.
Know the impacted application/service, environment (production/staging), and an approximate timeframe for the incident.
Workflow: Investigating Application Log Incidents
Identify the Incident Scope
Determine the affected application/system, environment (production/staging), and time window.
Select Data View: Choose the Correct Source/Index
In Elastic Cloud, select the appropriate data view to specify which data sources and indices you want to query.
Focus on the relevant environment—e.g., choose “GDP Labs EKS Production” for production issues.

Select the “GDP Labs EKS Production” data view from the dropdown to monitor the production environment. The “Data views” menu allows you to switch between environments like “GDP Labs EKS Production” and “GDP Labs EKS Staging.” Focus on key data by choosing “GDP Labs EKS Production” from the available data views.
Set the Time Range: Filter by Incident Period
Use the time picker to narrow the query to the timeframe of the incident.
Choose between relative dates (e.g., Last 24 hours) or absolute values (specific start/end).


The calendar icon and time settings make it easy to specify the precise timeframe when the incident occurred.
Field-Based Filtering: Customize the Log View
Use the list of fields in the left sidebar to view all available log fields.
Select or deselect fields you want to display, such as
message,kubernetes.namespace,container.name, etc.
The field selection panel separates popular fields from all available fields for easy navigation and discovery.
Apply Filters: Precise Log Selection
Add filters to focus your search (e.g., by namespace, pod, or error type).
Use operators like
is,exists, orone of.
The add filter interface allows you to set filter operators such as
is,is not,is one of,exists, anddoes not exist.
Advanced Querying: Using KQL or Lucene
For more complex searches, use Kibana Query Language (KQL) or Lucene syntax.
Example KQL queries:
Error logs in production
kubernetes.namespace: "production" and log.level: "error"Logs by trace or pod
trace.id: "abcd1234"Search log messages for “timeout”
message: "timeout"
Switch between KQL and Lucene as needed.
Analyze Log Volume Trends (Bar Chart)
Use the vertical bar chart to see log activity over time.
Click bars to zoom in on peak times or anomalies for deeper analysis.

Review Log List
The Documents tab displays all log entries sorted by
@timestamp.Scan for errors, warnings, or notable events around the incident.

Inspect Log Details
Click any log entry to see a detailed breakdown in table or JSON view.
Check for context such as error codes, deployment info, container, pod, user agent, etc.


Review Log and Field Statistics
Use the field statistics tool to quickly spot outliers, high cardinality fields, or trends.
Identify which fields are most relevant or show unusual values for this incident.

Troubleshooting: When Logs Don’t Appear
Double-check the time range, especially when switching between "Relative" and "Absolute" mode.
Confirm the correct data view/index is selected.
Loosen filters if too few logs appear; make them stricter if there are too many.
Check for typos or syntax errors in your KQL/Lucene queries.
Ensure you have permission to view the relevant logs/data views.
Best Practices for Incident Log Analysis
Start with broad time and filter scopes, then refine as you spot patterns.
Combine field-based filters and search queries for the most accurate results.
Always expand individual log entries for full context.
Use export/share features to collaborate (if supported in your platform).
Correlate log findings with other observability tools (traces, metrics, APM) if available.
Quick-Reference Table: Common Log Fields
Below is a quick-reference table detailing the common log fields and their descriptions.
@timestamp
The date and time the event occurred
message
The main log message
log.level
Log severity level (info, error, etc.)
container.name
Name of the Docker/Kubernetes container
kubernetes.namespace
Kubernetes namespace
kubernetes.pod.name
Kubernetes pod name
agent.type
Log collector agent type (e.g., filebeat)
Example Walkthrough
An incident occurs on a production service at 11:00.
Select “GDP Labs EKS Production” in Data Views.
Set the Absolute/Relative time range to cover around 11:00.
Filter logs with KQL:
log.level: "error"Click the spike in the bar chart around that time.
Review entries for error/warning patterns.
Expand specific entries for detailed context (trace ID, container, etc.).
Last updated