Company
Date Published
Author
Guy Fighel
Word count
671
Language
English
Hacker News points
None

Summary

If you’ve ever been an on-call SRE, you’re familiar with alert fatigue: the burned out feeling that creeps in after responding to alert after alert from tons of services and tools across your stack. Not only is this phenomenon exhausting, but constant pages also limit your ability to focus on other work, even if you’re simply clicking “acknowledge” (“acking”). Research has shown that people lose up to 40% of productive time with brief context switches. Many of the alerts causing never-ending streams of pages are neither urgent nor important, and don’t require any human action. These notifications come from all kinds of tools in your production system and tend to get quickly acked but largely ignored since there usually isn’t an underlying actionable issue. Low-priority alerts indicate problems that may eventually need to be addressed, but are low on the current priority list. Flapping alerts can feel like playing whack-a-mole, with unrelated issues sometimes getting lost in piles of notifications. Duplicate alerts and correlated alerts, such as those from redundant monitoring configuration or complex system issues, require more investigation time and can build frustration. Implementing an AIOps platform like New Relic AI can help tackle alert noise across your stack by providing machine learning-driven filters and logic to correlate and prioritize incidents, reducing pager fatigue and empowering teams to stay focused on important issues.