Observability evaluates what’s happening in your software from the outside. The term describes one cohesive capability. The goal of observability is to help you see the condition of your entire system.
Observability needs information about metrics, traces, and logs – the three pillars. When you combine these three “pillars,” a remarkable ability to understand the whole state of your system also emerges. This information might go unnoticed within the pillars on their own. Some observability solutions will put all this information together. They do that as different capabilities, and it’s up to the observer to determine the differences. Observability isn’t just about monitoring each of these pillars at a time; it’s also the ability to see the whole picture and to see how these pieces combine to fit in a puzzle and show you the actual state of your system.
The Three Pillars of Observability
As mentioned earlier, there are three pillars of observability: Logs, Metrics, and Traces.
Logs are the archival records of your system functions and errors. They are always time-stamped and come in either binary or plain text and a structured format that combines text and metadata. Logs allow you to look through and see what went wrong and where within a system.
Metrics can be a wide range of values monitored over some time. Metrics are often vital performance indicators such as CPU capacity, memory usage, latency, or anything else that provides insights into the health and performance of your system. The changes in these metrics allow teams to understand the system’s end performance better. Metrics offer modern businesses a measurable means to improve the user experience.
Traces are a method to follow a user’s journey through your application. Trace documents the user’s interaction and requests within the system, starting from the user interface to the backend systems and then back to the user once their request is processed.
This is a three-part blog series on these 3 pillars of observability. In this first part, we will dive into logs.
Check out this article to know more about observability here
The First Pillar – Logs
In this part of the blog, we will go through the first pillar of Observability – Logs.
Logs consist of the system’s structured and unstructured data when specific programs run. Overall, you can think of a log as a database of events within an application. Logs help solve unpredictable and irregular behaviors of the components in a system.
They are relatively easy to generate. Almost all application frameworks, libraries, and languages support logging. In a distributed system, every component generates logs of actions and events at any point.
Log files entail complete system details, like fault and the specific time when the fault occurred. By examining the logs, you can troubleshoot your program and identify where and why the error occurred. Logs are also helpful for troubleshooting security incidents in load balancers, caches, and databases.
Logs play a crucial role in understanding your system’s performance and health. Good logging practice is essential to power a good observability platform across your system design. Monitoring involves the collection and analysis of logs and system metrics. Log analysis is the process of deriving information from these logs. To conduct a proper log analysis, you first need to generate the logs, collect them, and store them. Two things that developers need to get better at logging are: what and how to log.
But one problem with logging is the sheer amount of logged data and the inability to search through it all efficiently. Storing and analyzing logs is expensive, so it’s essential to log only the necessary information to help you identify issues and manage them. It also helps to categorize log messages into priority buckets called logging levels. It’s vital to divide logs into various logging levels, such as Error, Warn, Info, Debug, and Trace. Logging helps us understand the system better and help set up necessary monitoring alerts.
Insights from Logs
You need to know what happened in the software to troubleshoot system or software level issues. Logs give information about what happened before, during, and after an error occurred.
A trained eye monitoring log can tell what went wrong during a specific time segment within a particular piece of software.
Logs offer analysis at the granular level of the three pillars. You can use logs to discover the primary causes for your system’s errors and find why they occurred. There are many tools available for logs management like
You can then monitor logs using Grafana or Kibana or any other visualization tool.
The Logs app in Kibana helps you to search, filter, and follow all your logs present in Elasticsearch. Also, Log panels in Grafana are very useful when you want to see the correlations between visualized data and logs at a given time. You can also filter your logs for a specific term, label, or time period.
Check out these 3 best Grafana reporting tools here
Limitations of Logs
Logs show what is happening in a specific program. For companies running microservices, the issue may not lie within a given service but how different connected functions. Logs alone may show the problem but do not show how often the problem has occurred. Saving logs that go back a long time can increase costs due to the amount of storage required to keep all the information.
Similarly, coming up with new containers or instances to handle client activity means increasing the logging and storage cost.
To solve this issue, you need to again look to another of the three pillars of observability—the solution for this: metrics. We will cover metrics in the second part of our observability series. Stay tuned to learn more about observability.
Try our new and improved Skedler for custom generated Grafana reports for free!