Company
Date Published
Author
Julie Gunderson
Word count
5503
Language
English
Hacker News points
None

Summary

In this podcast episode, Chris Martello, the manager of application performance at Cengage, discusses his journey from being a middle school science teacher to a chaos engineering enthusiast, emphasizing the importance of chaos engineering in enhancing software reliability. He explains how Cengage leverages chaos engineering to prepare for peak traffic events typical in the higher education sector, which aligns with academic calendars. Martello highlights the significance of practicing chaos through collaborative fire drills, which improve communication and reduce mean time to resolution during actual outages. He recounts how the "Day of Darkness," an 18-hour service outage, led to the implementation of regular chaos testing, involving 16 different teams to ensure system stability and performance. Martello also underlines the critical role of customer support in responding to performance issues, illustrating how chaos testing has become an integral part of Cengage's engineering culture, ultimately contributing to a robust and reliable user experience.