How to Do Effective Infrastructure Monitoring for Linux with Grafana

Post Details

Company

Grafana Labs

Date Published

Oct. 9, 2019

Author

Julie Dam

Word Count

1,617

Language

English

Hacker News Points

-

Source URL

grafana.com/blog/how-to-do-effective-infrastructure-monitoring-for-linux-with-grafana

Summary

Grafana Labs utilizes a sophisticated infrastructure monitoring system for its extensive GKE clusters, employing tools like Prometheus for metrics, Loki for logs, and Jaeger for distributed tracing. At the heart of their approach is the use of Prometheus' node exporter, which collects hardware and operating system metrics from Linux systems. The monitoring strategy emphasizes alerting over constant dashboard observation, ensuring that alerts are meaningful and actionable. Grafana Labs addresses various system metrics, such as CPU and disk utilization, through thoughtful alerting rules and visualization techniques, and they advocate for using Jsonnet-based libraries for defining these alerts. They also explore advanced monitoring methods, like utilizing the node_pressure metric for CPU saturation and employing the textfile collector for tracking maintenance jobs. The company draws inspiration from GitLab's infrastructure-monitoring practices, particularly their organizational approach to monitoring dashboards. This comprehensive system aids in capacity planning and maintaining oversight of their application infrastructure.