Content Deep Dive
Introducing recovery thresholds for metric alerts
Blog post from Datadog
Post Details
Company
Date Published
Author
Rachel Windzberg
Word Count
364
Language
English
Hacker News Points
-
Summary
The text discusses the problem of unstable metrics causing frequent alerts and introduces recovery thresholds in Datadog to address this issue. Recovery thresholds work on the principle of hysteresis, similar to how a thermostat regulates temperature. They set different values for when an alert triggers and when it resolves, ensuring that an alert only enters the "recovered" state once a metric has passed the recovery threshold. This helps reduce noise and distraction from flapping monitors and increases confidence in issue resolution. Recovery thresholds can be set up during monitor creation via the UI or API, with separate thresholds for recovery from alert and warning states.