Home / Companies / New Relic / Blog / Post Details
Content Deep Dive

How We Created a Heartbeat Test for Monitoring Incident Intelligence

Blog post from New Relic

Post Details
Company
Date Published
Author
Shy Peleg, Director of Software Engineering, Applied Intelligence
Word Count
825
Language
English
Hacker News Points
-
Summary

An unexpected outage in New Relic's Incident Intelligence product, caused by the accidental deletion of a critical Google Cloud Platform resource, led to the implementation of a robust smoke testing system to prevent future incidents. The system employs New Relic Alerts and synthetics to simulate incidents and ensure continuous data flow through the pipeline by periodically inserting and checking messages, with alerts triggered if expected signals are not received. This proactive measure provides an additional layer of protection for customers and enhances the reliability of New Relic's services, as explained by Shy Peleg, Director of Software Engineering at New Relic, who highlights the importance of detecting and addressing potential system failures swiftly.