Home / Companies / Unleash / Blog / Post Details
Content Deep Dive

Lessons we keep learning from the industry’s biggest outages

Blog post from Unleash

Post Details
Company
Date Published
Author
Michael Ferranti
Word Count
885
Language
-
Hacker News Points
-
Summary

Over the past six months, two significant outages at Google Cloud and Cloudflare have highlighted the recurring issue of small, seemingly benign backend changes causing widespread disruptions. These incidents underscore a pattern where minor configuration updates, lacking runtime controls like feature flags or kill switches, propagate unexpectedly through complex systems, resulting in significant downtimes. Despite numerous postmortems and recommendations from industry leaders and frameworks, many engineering teams continue to prioritize deployment speed over reversibility, often overlooking the critical importance of runtime controls. This oversight persists even though such controls are essential for managing high-impact areas like authentication, data routing, and infrastructure upgrades, proving that operational excellence at scale necessitates treating all backend changes as potentially reversible and ensuring they are protected by robust runtime controls.