Company
Date Published
Author
Andre Newman
Word count
1210
Language
English
Hacker News points
None

Summary

At Chaos Conf 2020, experts from Twilio and AWS shared insights on Chaos Engineering, focusing on system reliability and resilience. Tyler Wells from Twilio emphasized understanding systems through a structured framework that evaluates service context, testing quality, and comparing current service levels to objectives. Adrian Cockroft from AWS discussed avoiding "availability theater" by ensuring systems can failover effectively and used analogies to highlight system complexity. Both talks emphasized proactive testing, such as GameDays, to assess system behavior and resilience by simulating failures and suggesting improvements. The conference highlighted the importance of understanding and managing system complexities to enhance reliability, with additional resources like Twilio's open-sourced GameDay template and guides for automating reliability management.