Company
Date Published
Author
Alexis Lê-Quôc
Word count
1991
Language
English
Hacker News points
None

Summary

Monitoring data collected from various systems can be categorized into work metrics and resource metrics. Work metrics capture the top-level health of a system by measuring its useful output, while resource metrics help reconstruct a detailed picture of a system's state. Key characteristics of good monitoring data include being well-understood, granular, tagged by scope, and long-lived. The collected data can be used to generate alerts, diagnose issues, and investigate problems. It is essential to instrument everything, collect as many relevant metrics and events as possible, and retain them at full granularity for a sufficient amount of time to maximize their value.