Home / Companies / Gremlin / Blog / Post Details
Content Deep Dive

Podcast: Break Things on Purpose | Ep. 11: Ryan Kitchens, Senior Site Reliability Engineer at Netflix

Blog post from Gremlin

Post Details
Company
Date Published
Author
Rich Burroughs
Word Count
8,646
Language
English
Hacker News Points
-
Summary

In this podcast episode of "Break Things on Purpose," Netflix's Senior Site Reliability Engineer (SRE) Ryan Kitchens discusses the complexities and challenges of reliability engineering, comparing his experiences at Netflix with those at Blizzard Entertainment while working on World of Warcraft. Kitchens explores the nuances of managing incidents at scale, highlighting the importance of understanding mental models, the limitations of root cause analysis, and the role of chaos engineering in improving system resilience. The conversation delves into how incidents should not be viewed merely as failures but as opportunities for learning and evolving systems. Kitchens emphasizes the need for organizations to focus on continuous improvement and learning from incidents to foster resilience and adaptability, rather than solely aiming to eliminate incidents. He also discusses the significance of engaging with diverse perspectives during incident reviews to generate insights and improve organizational practices.