Best Practices for Monitoring Kubernetes using Grafana

栏目: IT技术 · 发布时间: 6年前

内容简介：Microservices and containers have taken the tech industry by storm. Kubernetes is one of the tools that has evolved to manage these new aspects of software development. It is an open-source system for automating deployment, scaling, and management of conta

Best Practices for Monitoring Kubernetes using Grafana

Home
Blog
DevOps
Best Practices for Monitoring Kubernetes using Grafana

Microservices and containers have taken the tech industry by storm. Kubernetes is one of the tools that has evolved to manage these new aspects of software development. It is an open-source system for automating deployment, scaling, and management of containerized applications. One of the biggest challenges that organizations face when adopting Kubernetes is performing monitoring tasks in this dynamic environment.

Traditional monitoring strategies don’t work for containerized applications, since containers are ephemeral and are difficult to troubleshoot. When you add container orchestration to this mix, managing your application’s underlying infrastructure and taking care of its operational aspects at scale become very challenging. Having an efficient strategy for centralized metrics and monitoring dashboards is key to successfully running your applications in Kubernetes.

Grafana is an open-source data visualization and analytics tool that can monitor time-series data and can be used to monitor your Kubernetes cluster. It can query a large number of datastores and help users visualize, alert on, and understand the metrics. Grafana can be installed on any operating system, and developers can access the tool via a browser.

This article looks at some best practices for monitoring your Kubernetes cluster with Grafana. We’ll examine this tool’s ability to leverage metrics that give you in-depth insights into the health and performance of your Kubernetes cluster, node, pod, and containers through sophisticated dashboards.

Grafana in the Kubernetes Monitoring Architecture

Grafana dashboards are centralized places to check real-time metrics. They are critical to monitoring both your applications and your infrastructure. You can leverage Kubernetes metrics in Grafana to get complete visibility into the state of your Kubernetes cluster and ensure that your services are running as expected.

Metrics that you can use Grafana dashboards to monitor include:

Kubernetes cluster resource utilization (CPU/memory on cluster, node, pod and container level)
Actual CPU/memory usage of Kubernetes cluster nodes
Health status of individual Kubernetes nodes
Available resources on individual Kubernetes nodes
Requested usage vs. actual usage of resources
Pod health and availability

Which Kubernetes Metrics to Monitor

From a Kubernetes monitoring standpoint, there are two types of metrics available: system-level metrics and application-level metrics. System metrics can be fetched from various out of the box core Kubernetes sources, like cAdvisor, Metrics Server, and Kubernetes API server. The application-level metrics can be fetched from several third party monitoring solutions/integrations like Prometheus Node Exporter and kube-state-metrics. You can read more about the Kubernetes Monitoring Architecture here .

The following three lists contain important Kubernetes metrics that you should monitor:

Cluster Metrics

Cluster level overview of workloads deployed
Cluster CPU usage: used vs. total
Cluster memory usage: used vs. total (you can configure this in the memory-defaults.yaml file under the default-mem-example namespace)
Cluster file system usage: used vs. total
Cluster network I/O pressure
Cluster health (pod status, pod restarts, pod throttling)
Overview of nodes, pods, and containers

Node Metrics

Health check for master nodes—API server, scheduler, controller, etc.
Degradation of master nodes
Number of nodes available for serving pods
Node CPU utilization
Node memory usage
Node disk space available for placing pods
Node disk I/O usage
Node network traffic (in and out)—receive and transmit
Node network traffic errors
Node network traffic drop

Pod/Container Metrics

Resource allocation for pods
Pods which are either underprovisioned or overprovisioned
Number of running pods in the cluster
Healthy vs. unhealthy pods in the cluster
Percentage of throttled containers
Number of container restarts that have occurred
Number of persistent volumes in a failed or pending state
Container CPU and memory utilization (you can configure this in the memory-defaults-pod.yaml file for each pod or container)

Troubleshooting with Kubernetes and Grafana

Grafana dashboards are excellent resources for data visualization, and they provide meaningful insights into the metrics collected from various data sources. These dashboards can be beneficial in a number of troubleshooting scenarios, such as the following:

Correlating cluster instability and performance degradation issues with resource planning—requests vs. limits.
Visualizing container restarts that might indicate a problem with your application.
Correlating throttled pods or unhealthy pod states with I/O wait times and memory spikes on nodes.
Correlating issues with unhealthy pod states or throttled pods using CPU utilization.
Determining the source of I/O waits by correlating I/O wait spikes with disk or network spikes using the disk I/O and network stats.
Monitoring Kubernetes nodes and identifying workload bottlenecks.

From an application perspective, using the RED metrics (Request rate, Error rate, and Duration) for instrumenting the services running in Kubernetes is critical for investigating any performance issue. Leveraging Grafana’s built-in alerting capabilities can make it easy to notify teams when business thresholds are breached.

How to Add Data Sources in Grafana

Grafana fetches information from data sources and displays it in graphs. These data sources are the storage backends for your time series data. Grafana supports several data sources out of the box, including Prometheus, InfluxDB, MySQL, Elasticsearch, AWS CloudWatch, and Azure Monitor.

When building your dashboard, you can combine data from multiple data sources into a single dashboard. However, each panel is tied to a specific data source. There is a query editor which allows you to write queries against your data stores in order to provide visualizations of the metrics. You can choose from a number of visualization options and apply them to your panels.

Below is a screenshot showing the data sources that are currently officially supported by Grafana:

Choose your data source in Grafana

How to Build a Grafana Dashboard

Setting up a dashboard in Grafana is very straightforward. Dashboards consist of panels that can fetch information from various underlying data sources. By default, Grafana comes with a variety of panels such as Graph, Singlestat, Heatmap, Table. You can add panel plugins that allow new data visualizations to be added to Grafana for both time series and non-time series data.

You can organize your panels into rows, and you can drag and drop them across the dashboard. In addition, you can customize your panel’s look and feel from a wide range of available visualizations, and you can display data in a format that works best for your use case.

Choose your Kubernetes Grafana visualization

Tips and Tricks for Building a Grafana Kubernetes Dashboard

Keep your dashboards simple. Do not add too much information to a single dashboard, and try to limit the number of panels on a dashboard. Ideally, each panel should display a single metric, such as CPU, memory, or disk space. More graphs do not indicate better dashboards. At the end of the day, the key metrics on your dashboard should be easy to understand and actionable.

Ensure that dashboards are consistent by design. A simple trick to make consistent dashboards is to use the same layout and visualizations for all of them. If your dashboards are built differently for your various services, it can be confusing and difficult to make correct decisions during troubleshooting.

Dashboards should be developed by keeping the audience and their requirements in mind. The development team will need a detailed dashboard with less aggregation and increased diagnostics for troubleshooting purposes. Management might be interested in an aggregated dashboard that shows a high-level picture of all the services and their SLA/SLI/SLO. Make sure your dashboards are configured to help your staff with their decision making processes.

Tag your dashboards. Once your teams start creating dashboards for their services, you are likely to end up with a lot of dashboards. Tagging your dashboards is critical for organizing and grouping them.

Leverage open-source dashboards from Grafana. There is no need to reinvent the wheel. A large and active community works on this technology stack, and it is likely that someone else has already created a solution for your problem. Take a look at the official and community-built dashboards here .

Leverage Grafana plugins as extension points. There are a large number of third party plugins available (apart from the core plugins) that can be integrated with your Grafana dashboards to enhance the visualization of data. Currently, plugins are available as Panel, Data source, and App.

Source control your Grafana dashboards. One of the biggest challenges with dashboard maintenance is version drift. You can save Grafana dashboards as JSON files in source control and deploy them to the Kubernetes cluster via an automated build pipeline. This will ensure that the Grafana dashboards are consistent across all Kubernetes clusters in all environments.

Switch into the query-focused Explore mode for troubleshooting issues. You can easily focus on the query in Explore workflow without worrying about modifying queries in the existing dashboards. Once you have the final query ready, you can start working on the new dashboard or modify an existing one.

Take advantage of template variables to create dynamic and reusable dashboards. Doing so gives you the ability to monitor a large number of components (set as template variables) with one centralized dashboard. Since you don’t need to hardcode the application name, server name, etc. in your metrics query, maintenance of the dashboards is much easier.

Boost dashboard performance by lazy loading of panels. There’s no need to load all the panels in a dashboard when you first open it. As you scroll down on your dashboard, the queries are executed against the backend data store, and metrics are displayed in the panels. This feature is enabled by default starting with v6.2.

Conclusion

Having a consolidated observability tool for cloud native applications is incredibly helpful. Grafana is headed towards bringing “the three pillars of observability” (metrics, logging and tracing) into a single experience. This will be a game-changer in the monitoring landscape, since users can bring in their favorite observability tooling by adding different data sources to Grafana dashboards.

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Best Practices for Monitoring Kubernetes using Grafana

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

后现代经济

姜奇平 / 中信出版社 / 2009-7 / 45.00元

《后现代经济:网络时代的个性化和多元化》站在历史“终结”与“开始”的切换点上，以价值、交换、货币、资本、组织、制度、福利等方面为线索，扬弃现代性经济学，对工业化进行反思，深刻剖析了“一切坚固的东西都烟消云散”的局限性，在此基础上展开对现代性经济的解构和建构。“9·11”中坚固的世贸中心大楼灰飞烟灭，2008年坚固的华尔街投资神话彻底破灭，坚固的雷曼兄弟公司在挺立了158年后烟消云散……一切坚固的东......一起来看看《后现代经济》这本书的介绍吧!

码农工具