How We Created a Heartbeat Test for Monitoring Incident Intelligence
Blog post from New Relic
An unexpected outage in New Relic's Incident Intelligence product, caused by the accidental deletion of a critical Google Cloud Platform resource, led to the implementation of a robust smoke testing system to prevent future incidents. The system employs New Relic Alerts and synthetics to simulate incidents and ensure continuous data flow through the pipeline by periodically inserting and checking messages, with alerts triggered if expected signals are not received. This proactive measure provides an additional layer of protection for customers and enhances the reliability of New Relic's services, as explained by Shy Peleg, Director of Software Engineering at New Relic, who highlights the importance of detecting and addressing potential system failures swiftly.