Three Pillars of Observability – Metrics ( Part 2)

Introduction

Distributed systems mean services and servers are spread out over multiple clouds. The individual users who consume the services increase their number, device of choice, and location. Having visibility into the client’s experience while using the application – i.e., observability – is now a vital part of the metrics required to operate the applications in your infrastructure.

What is Metrics?

A metric is a quantifiable value measured over a while and includes specific characteristics like timestamp, name, KPIs, and value. Unlike logs, metrics are structured by default, making it easier to query and optimize for storage giving you the ability to retain them for more extended periods.

Metrics help uncover some of the most primary queries of the IT department. Is there a performance issue that’s affecting customers? Are employees having trouble accessing? Is there high traffic volume? Is the rate of customer churn going up?

Standard metrics include

  1. System metrics such as CPU usage, memory usage, disk I/O,
  2. App metrics such as rate, number of errors, time,
  3. Business metrics such as revenue, signups, bounce rate, cart abandonment, etc.

Different Components of Metrics

Metrics is the most valuable of the three pillars because they’re generated very often and by every module, from operating systems to applications. Associating them can give you a complete view of an issue, but associating them is a huge and tedious task for human operators.

Data Collection

The most significant part of metrics is small and does not consume too much space. You can gather them cheaply and store them for an extended period. These give you a general overview of the whole system without insights.

So, metrics answer the question, “How does my system performance change through time?”

Data Storage

Most people used statsd along with graphite as the storage backend. Some people now prefer Prometheus, an open-source, metrics-based monitoring system. It does one thing pretty well, with a simple yet powerful data model and a query language, it lets you analyze how your applications and infrastructure perform.

Visualization and Reporting

I would also consider visualization a part of metrics, as it goes hand in hand with metrics.

Grafana is used to visualize the data scraped by sources like Prometheus, a  data source to grafana, which works on a pull model. You can also use Kibana as your visulaization tool, primarily supporting elastic stack.

And you can use Skedler to generate reports from these visualizations to share with your stakeholders.

There is a simple and effective way to add reporting for your Elasticsearch Kibana (including Open Distro for Elasticsearch) or Grafana applications that are deployed to Kubernetes using Skedler.

You can deploy Skedler on air-gapped, private, or public cloud environments with docker or VM on various flavors of Linux.

Skedler is easy to install, configure and use with Kibana or Grafana. Skedler’s no-code Drag-n-drop UI generates PDF, CSV, Excel Kibana, or Grafana reports in minutes and saves up to 10 hours per week.

Try our new and improved Skedler for custom generated Grafana or Kibana reports for free!

Download Skedler

Conclusion

Metrics are the entry point to all monitoring platforms based on the data collection from CPU, memory, disk, networks, etc. And so, they no longer belong only to operations —  metrics can be created by anyone and any system in the distributed network. For instance, a developer may opt to showcase application-specific data such as the number of tasks performed, the time required to complete the tasks, and the status. Their objective is to link these data to different levels of systems and define an application profile to identify the necessary architecture for the distributed system itself. This adds to improved performance, reliability, and better security system-wide.

Metrics used by development teams to identify points in the source code that need improvement can also be used by operators to assess the system requirements and plan needed to support user demand and the team to control and enhance the adoption and use of the application.

Installing, configuring Skedler Reports as Kibana Plugin with Elasticsearch and Kibana Environment using Docker Compose

Introduction

If you are using ELK stack, you can now install Skedler as a Kibana plugin. Skedler Reports plugin is available for Kibana versions from 6.5.x to 7.6.x.

Let’s take a look at the steps to Install Skedler Reports as a Kibana plugin.

Prerequisites:

  1. A Linux machine
  2. Docker Installed
  3. Docker Compose Installed

Let’s get started!

Login to your Linux machine and update the repository and install Docker and Docker Compose. Then follow the below steps to update the Repository:

Setting Up Skedler Reports

Create a Directory, say skedlerplugin

Now, create a Docker Compose file for Skedler Reports. You also need to create a Skedler Reports configuration file, reporting.yml, and a Docker Compose file for Skedler as below,

Create an Elasticsearch configuration file – reporting.yml and paste the config as below.

Download the reporting.yml file found here

Setting Up Elasticsearch

You also need to create an Elasticsearch configuration file, elasticsearch.yml. Docker Compose file for Elasticsearch is below,

Create an Elasticsearch configuration file elasticsearch.yml and paste the config as below.

Setting Up Skedler Reports as Kibana Plugin

Create a Directory inside skedlerplugin, say kibanaconfig

Now, create a Docker file for Kibana and check the Docker file for Kibana as below,

Then, copy the URL of the Skedler Reports plugin matching your exact Kibana version from here.

You also need to create a Docker Compose file for Kibana is below,

Create a Kibana configuration file kibana.yml inside the kibanaconfig folder and paste the config as below.

Create a Skedler Reports as Kibana Plugin configuration file skedler_reports.yml inside the kibanaconfig folder and paste the config as below.

Configure the Skedler Reports server URL in the skedler_reports_url variable. By default, the variable is set as shown below,

If the Skedler Reports server URL requires basic authentication, for example, Nginx, uncomment and configure the skedler_username and skedler_password with the basic authentication credentials as shown below: Now run the docker-compose.

Access Skedler Reports the IP and Port and you will see the Skedler Reports UI.

| http://ip_address:3000

Access Elasticsearch the IP and Port and you will see the Elasticsearch UI.

| http://ip_address:9200

Access Kibana using the IP and Port and you will see the Kibana UI.

| http://ip_address:5601

So now the Composite docker-compose file will look like below,

You can Simply do compose up and down.

Summary

Docker compose is a useful tool to manage container stacks for your client. And manage all related containers with one single command.

Episode 8 – How to Build a Cloud-Scale Monitoring System

In Episode 8 of the Infralytics Show, Shankar interviewed Molly Struve. Molly is the Lead Site Reliability Engineer for DEV Community, an online portal designed as a place where programmers can exchange ideas to help each other. The discussion focused on two topics, “How to build a cloud-scale monitoring system” and “How to scale your Elastic Stack for cloud-scale monitoring.” 

[video_embed video=”8bzSK3EiIPw” parameters=”” mp4=”” ogv=”” placeholder=”” width=”700″ height=”400″]

How Molly started working in software engineering and cloud-scale monitoring

Molly earned an aerospace degree from MIT after originally thinking she would study software engineering. She said that since all engineering degrees provide students with the same core problem-solving skills, so when she later decided to work in the software engineering field, she already had the problem-solving background she needed in order to make the transition. The reason she didn’t end up going the aerospace route is that you have to be located in California or Washington where the aerospace industry is but she is from Chicago and didn’t really want to move. It’s good to know that people with various different educational backgrounds have still been able to find success in software engineering!

Let’s jump into the discussion of cloud-scale monitoring! Here are the key points Molly made in reference to the topics listed above.

The Interview – building a cloud-scale monitoring system

What are some of the key requirements to look for when you build out a large cloud-scale monitoring system?

When you start monitoring, you just want coverage, and to do that you often start adding all of these different tools and before you know it you have 6, 7, or 8 different tools doing all this monitoring. However, when the time comes to use it you have to open up all these different windows in your browser just to piece together what is actually going on in your system. So, one of the key things she tells people when they are building a monitoring system is that they have to consolidate all of the reporting. You can have different tools, but you need to consolidate the reporting to a single place. Make sure everything’s in one place so it’s a one stop shop to go and find all the information you need.

When an alert triggers, it must require an action so alert fatigue is a big problem in many monitoring systems. When you have a small team it might seem fine to have exceptions that everyone knows when you don’t respond to certain alerts, but as your team gets larger you have to tell new engineers what the exceptions are, and this process just simply doesn’t scale. So you have to be very disciplined in responding to alerts.

The goal is to get to a point where whoever is on call, whether it’s one person, two people, or three people, can handle the error workload that is coming into the system by way of alerts. 

In the beginning, when you are setting up a monitoring system you might have a lot of errors, and you just have to fix stuff and the improvement of the system comes with time. The ideal metric is zero errors, so you need to be aware of when errors get to a point where they need to be addressed.

Monitoring from an infrastructure perspective is different from monitoring from a security perspective

Trying to figure out what to monitor is also very challenging. You have to set up your monitoring and adjust it as you go depending on what perspective you are monitoring for. Knowing what to monitor is a little bit based on trial and error. That way, if there is data that you wish you had monitoring for, you can address the error and then go in and add the necessary code so that it’s there in the future. After you do that a few times you will end up with a really robust system so the next time an error occurs, all the information you need will be there and it might only take you a few minutes to figure out what’s wrong.

Beyond bringing the data together and optimizing alerting, what are the other best practices?

Another best practice is tracking monitoring history. When trying to solve the error from an alert, you will want to know what the past behavior was. Past behavior can help you debug a problem. What were you alerted about in the past and how was the problem addressed then?

Also, you have to remove all manual monitoring for your monitoring system to be truly scalable. Some systems require employees to check a dashboard every few hours, but this task is easily forgotten. So, if you want a monitoring system to scale you have to remove all manual monitoring. You don’t want to rely on someone opening up a file or checking a dashboard to find a problem. The problem should automatically come to you or whoever is tasked with addressing it. 

What tools did you use to automate?

At Kenna we used datadog. It’s super simple, it integrates really easily with ruby which is the language I primarily work with.

Anything else important on the topic of best practices for cloud-scale monitoring?

Having the ability to mute alerts when you are in the process of fixing them is important. When a developer is trying to fix a problem, it’s distracting to have an alert going off repeatedly every half hour. Having the ability to mute an alert for a set amount of time like an hour or a day can be very helpful. 

What else is part of your monitoring stack?

The list goes on and on. You can use honeybadger for application errors, AWS metrics for your low-end infrastructure metrics, StatusCake for your APIs to make sure your actual site is up, Elasticsearch for monitoring, circleci for continuous integration. It’s a large list of many different tools, but we consolidated them all through datadog. 

What kind of metrics did your management team look for?

Having a great monitoring system allows you to catch incidents and problems before they become massive problems. It’s best to be able to fix issues before the point at which you would have to alert users to the problem. You want to solve problems before they impact your user base. That way on the front-end it looks to the user like your product is 100% reliable, but it’s just because developers have a system on the backend that alerts them to problems so they can stop them before they directly impact users. Upper management obviously wants the app to run well because that’s what they are selling and the monitoring system allows for that to happen.

How big was the elasticsearch cluster where you worked before?

The logging cluster that we used at Kenna had 10 data nodes. The cluster we used for searching client data was even bigger. It was a 21 node cluster. 

What were some of the problems when it came to managing this large cluster?

You want to be defining what you are logging. and make it systematic. Early on at Kenna we would be logging user information we would end up with a ton of different keys which created more work for elasticsearch. This also makes searching and using the data nearly impossible. To avoid this you need to come up with a logging system by defining keys and making sure that everyone is using those keys when they are in the system and logging data. 

We set up our indexes by date, which is common. When you get a month out from the date on a specific index, you want to shrink them to a single shard, which will decrease the number of resources that elasticsearch needs in order to use that index. Even further out than that, you eventually should close that index so that elasticsearch doesn’t need to use any resources for it. 

Any other best practices for cloud-scale monitoring?

Keep your mapping strict and that can help you to avoid problems. If you are doing the searching yourself, try to use filters rather than queries. Filters run a lot faster and are easier on elasticsearch so you want to use them when you are searching through data.

Finally, educating your users on how to use elasticsearch is important. If developers don’t know how to use it correctly, elasticsearch will time out. So, teach users how to search keys, analyzed fields, unanalyzed fields, etc. Also, this will help your users get the targeted, accurate data they are looking for so educating them on how to use elasticsearch is for their benefit as well. Internal users at Kenna (which is who is being referred to here) were conducting searches through Kibana. Clients would interface with the data relevant to them (after training) through an interface that the Kenna team built which prevented clients from doing things that could take down the entire cluster. 

So are you using elasticsearch in your current role at DEV?

DEV is currently using a paid search tool, but we hope to switch to elasticsearch because elasticsearch is open source and it will give us more control over our data and how we search it.

There’s an affordable solution for achieving the best practices described

Molly described the importance of consolidating reporting, responding to alerts, avoiding alert fatigue, automating alerts and reports, and tracking monitoring history. Just two weeks prior to this interview, Shankar gave a presentation about avoiding alert fatigue, and this relevant topic keeps becoming a focus of discussions. Many of the points Molly made, from the importance of automating alerts and reports to the importance of consolidating reporting, are the reasons we started Skedler. 

Are you looking for an affordable way to send periodic reports from elasticsearch to users when they need it? Try Skedler Reports for free! 

Do you want to automate the monitoring of elasticsearch data and notify users of anomalies in the data even when they aren’t in front of their dashboards? Sign up for a free trial of Skedler Alerts!

We hope you are enjoying our podcast so far. Happy holidays to all of our listeners. We will be taking a short break, but will be back with new episodes of The Infralytics Show in 2020!

Tabular Reports from Elastic Stack – New in Skedler Reports v4.4

We are excited to announce the release of Skedler Reports v4.4. As always, it’s packed with capabilities to help you meet compliance, audit, and snapshot reporting requirements.

Tabular PDF, Excel, CSV Reports from Kibana Data Table

If you are a security analyst or network admin looking for the list of unauthorized IP addresses connecting to your machines, Skedler can deliver the data to you in the form of PDF or Excel. With just a couple of clicks, schedule a PDF and/or Excel report that uses the Kibana data table as a source, sit back and have the reports delivered to your stakeholders automatically!

[video_embed video=”l-4JSKe9ee4″ parameters=”” mp4=”” ogv=”” placeholder=”” width=”700″ height=”400″]

Schedule Reports with Custom Time Ranges

If your customer needs a daily report that summarizes the top security events during the work hours of 9 AM – 5 PM, you can send it to them right away. Simply create a custom time range in Kibana and customize your dashboard to use this time range.  In Skedler, schedule a daily report with the dashboard as a data source and you’re all set!

Here is the list of additional features in the new release:

  • You can use the latest features in Elastic Stack 7.3 and Grafana 6.3 and generate reports with Skedler.
  • Users do not need administrator privileges to configure Grafana as a data source in Skedler.

Go Ahead and Try it Out

Test out the data table reports with custom time ranges in ELK 7.3 or Grafana 6.3 environment! Start now below by doing the following:

  1. Download Skedler Reports
  2. Follow the simple steps in our documentation and start generating reports.

Webinar: Save Time and Money With Automated Reports & Alerts

How do you stay up to date on the critical events in your log analytics platform? Do you spend tens of thousands of dollars and countless hours to create reports and alerts from your Elastic Stack or Grafana application?

Whatever critical scenario arises, receiving the right information at the right time can ultimately be the difference between success and failure. Therefore, it’s important to be constantly aware of every situation, whether it be business partners, operations, customers, or employees, is crucial. The faster a possible issue is identified the faster it can be solved.

Benefits of Automation

Join us in the upcoming webinar on Tuesday, December 18th, 2018 @10AM PST to learn how Skedler, which installs in minutes, can help you save time & money with automated reports and alerts for Elastic Stack & Grafana.

Watch Our Webinar Here

You’ll learn how to quickly add reporting and alerting for Elastic Stack and Grafana while seeing how Skedler can provide a flexible framework to meet your complex monitoring requirements. Be ready with your questions and we’ll be more than happy to discuss them in the webinar Q&A session.

Watch Our Webinar Here

Graph Source: https://www.statista.com/chart/10659/risks-and-advantages-to-automation-at-work/

Skedler Update: Version 3.9 Released

Skedler Update: Version 3.9 Released

Here’s everything you need to know about the new Skedler v3.9. Download the update now to take advantage of its new features for both Skedler Reports and Alerts.

What’s New With Skedler Reports v3.9

  • Support for:
    • ReadOnlyRest Elasticsearch/Kibana Security Plugin.
    • Chromium web browser for Skedler report generation.
    • Report bursting in Grafana reports if the Grafana dashboard is set with Template Variables.
    • Elasticsearch version 6.4.0 and Kibana version 6.4.0.
  • Ability to install Skedler Reports through Debian and RPM packages.
  • Simplified installation levels of Skedler Reports here.
  • Upgraded license module
    • NOTE: License reactivation is required when you upgrade Skedler Reports from the older version to the latest v3.8. Refer to this URL to reactivate the Skedler Reports license key.
    • Deactivation of Skedler license key in UI

What’s New With Skedler Alerts v3.9

  • Support for:
    • Installing Skedler Alerts via Debian and RPM packages.
    • GET method type in Webhook.
    • Elasticsearch 6.4.0.
  • Simplified installation levels of Skedler. Refer to this URL for installation guides.
  • Upgraded license module:
    • NOTE: License reactivation is required when you upgrade Skedler Alerts from the older version to the latest v3.8. Refer to this URL to reactivate the Skedler Alerts license key.
  • Deactivation of Skedler Alerts license key in UI

 

Get Skedler Reports

Download Skedler Reports

Get Skedler Alerts

Download Skedler Alerts

 

Translate »