Top 5 Open Source Kubernetes Monitoring Tools

Image source

Monitoring distributed microservices like Kubernetes is not an easy task because they require real-time attention and proactive monitoring. To overcome this challenge, many companies develop various open-source monitoring tools for Kubernetes. 

Some tools collect metrics, others collect logs. Some are Kubernetes-native, others are more agnostic in nature. Some are data collectors while others provide an interface for operating Kubernetes. This article takes a look at five of the more popular Kubernetes monitoring tools out there. 

What Is Kubernetes?

Kubernetes (K8s) is an open-source platform for deploying, automating, managing, and scaling containerized applications. Kubernetes groups containers into clusters for easy discovery and management. You can deploy K8s on-premise, and in public or hybrid environments.

Key features of Kubernetes include:

  • Auto-healing—replaces containers after node failures, restart failed containers, and removes unfunctional containers.
  • Automatic packing—optimizes resource usage and ensures high availability by placing containers based on resource requirements. 
  • Load balancing and service discovery—automatically gives IP addresses, DNS names, and load-balancers to pods.
  • Automated rollbacks and rollouts—prevent failures due to system changes and revert to previous versions when issues occur.
  • Batch execution—scales applications automatically or manually, and manages CI workloads and batches.

What to Monitor in Kubernetes

There are a number of metrics you can monitor in Kubernetes.

Infrastructure

You should monitor all the underlying server components of the clusters since server-level problems can have an impact on workloads.

  • CPU utilization—CPU monitoring reveals both user and system consumption. Monitoring also shows the IOWait metric. IOWait is the amount of time that a CPU is waiting for reads and writes processes when running clusters in the cloud or with any network storage.
  • Disk space—keeping an eye on the available disk space is essential when running write-intensive services like datastore or etcd. Data writing failures can result in disk corruption that can lead to financial losses. 
  • Pod resources—make sure that the Kubernetes scheduler has all the required information about pod resources. This information is essential for your network design. You have to assess the number of nodes that can fail before the remaining nodes can no longer operate. 

Kubernetes services

You have to monitor all the components that make up a Kubernetes master or worker node. This includes the monitoring of etcd, Kube-controller manager, and other critical components. The monitoring system must detect failures and either fix them or send an alert.

Internal services

You can also monitor the applications directly since Kubernetes exposes internal resources metrics. Kubernetes usually tries to maintain the desired state of services. However, sometimes you need human intervention to fix complex issues in Kubernetes.

Top 5 Open Source Monitoring Tools for Kubernetes

The list below reviews some of the most popular open source monitoring tools for Kubernetes.

Prometheus

Prometheus is an open source tool used for event monitoring and alerting. Prometheus is based on an HTTP pull model to record real-time metrics in a time series database. In 2016, Prometheus joined the Cloud Native Computing Foundation (CNCF) and graduated the project in 2018.

Key features include:

  • Kubernetes integration—Prometheus is the default Kubernetes monitoring utility. It supports dynamically scheduled services and service discovery. 
  • Multi-dimensional data model—provides a label-based, time-series database that you can query with PromQL. 
  • Built-in alert manager—enables you to send notifications according to rules and channels you specify. As a built-in feature, the alert manager enables you to avoid the use of external systems or APIs. 
  • Pull-based metrics—enable you to collect metrics data through an exposed HTTP endpoint. This enables you to collect metrics on-demand.

You can find more information in this guide about Kubernetes monitoring at scale with Prometheus and Cortex.

Grafana

Grafana is an open source tool for monitoring metrics and analyzing data in large data sets. Grafana connects with databases and data sources like PostgreSQL, MySQL, Influx DB, Graphite, Prometheus, and ElasticSearch. You can also develop plugins from scratch for integration with many different data sources since Grafana is an open source solution.

Key features include:

  • Dashboard templating—templating enables you to create dashboards that you can reuse for multiple use cases. For example, you can use the same dashboard for a production server and a test server.
  • Provisioning—you can automate everything in Grafana with scripts. For example, you can automatically spin up Grafana and a new Kubernetes cluster with a script that contains the IP address, server, and data sources preset.
  • Annotations—useful for data correlation in case something goes wrong. You can create the annotations manually by adding text to the graph or you can fetch data from any data source.
  • Alerting—you can get alerts through different channels including SMS, email, Slack or PagerDuty. If you prefer other channels of communication, you can create your own notifiers with a bit of code.

Fluentd

Fluentd Fluentd separates data sources from backend systems by providing a unified logging layer in between. The logging layer enables you to collect many types of logs as they are generated.  

Key features include:

  • JSON data structure—enables you to unify all log data processing aspects such as buffering, filtering, and outputting logs across different sources.
  • Pluggable architecture—a flexible plugin system enables you to extend the functionality of Fluentd by connecting multiple data sources and outputs. 
  • System resources—Fluentd instance runs on 30-40MB of memory and can process 13K events per second. You can use the Fluent Bit lightweight forwarder if you need more memory.
  • Reliability—supports file-based and memory buffering to prevent data loss on nodes. In addition, you can set up Fluentd to support high availability and robust failover.

cAdvisor

cAdvisor collects, processes, and exports performance and resource usage information about running containers. cAdvisor has native support for Kubernetes because it is integrated into the Kubelet binary. 

Key features include:

  • Auto-discovery—automatically discovers all containers in a given node and collects statistics like memory, CPU, network, and filesystem usage.
  • Overall machine usage—provides the overall machine usage by analyzing the ‘root’ container on the machine.
  • Storage plugins—exports stats to different storage plugins like Elasticsearch, and InfluxDB.
  • Web-UI—you can view metrics on a Web-UI that shows live information about all containers on the machine.

Jaeger

Jaeger is a tracing system used for monitoring and troubleshooting complex distributed systems like Kubernetes.

Key features include:

  • High scalability—designed to have no Single Points of Failure (SPOF) and scale with business needs.
  • Multiple storage options—supports two open source NoSQL databases, Elasticsearch and Cassandra. Jaeger also provides simple in-memory storage for testing.
  • Cloud-native deployment—supports different configuration methods, including environment variables, command-line options, and configuration files. Kubernetes deployment is supported by Kubernetes templates, Kubernetes operator, and Helm charts.

Conclusion

This is just a partial list of available open-source Kubernetes monitoring tools. But it is a good place to start, if you are just starting to design your own Kubernetes environment. These five tools are easy to test and deploy. You just need to set up a small sandbox environment, and try to understand whether these tools are what you need.

Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.

LinkedIn: https://www.linkedin.com/in/giladdavidmaayan