Company
Date Published
Author
Jonathan Pearlin
Word count
1304
Language
English
Hacker News points
None

Summary

The New Relic engineering team uses the platform to enhance their DevOps and reliability practices by leveraging various features such as capacity monitoring, SLA monitoring, SLI monitoring, data health, "dark data", and gameday testing to increase the reliability and availability of their products. They automate part of their capacity planning process using custom metrics and events, calculate and monitor SLAs for notification latency, create an API system called Galileo to detect violations of key system health indicators across all of New Relic, use data apps to track the fidelity of their data stream, gather "dark data" about new features, perform gameday tests to ensure everything works as expected when introducing chaos into the system.