Lessons we keep learning from the industry’s biggest outages

Post Details

Company

Unleash

Date Published

Nov. 24, 2025

Author

Michael Ferranti

Word Count

885

Language

-

Hacker News Points

-

Source URL

www.getunleash.io/blog/lessons-we-keep-learning-from-the-industrys-biggest-outages

Summary

Over the past six months, two significant outages at Google Cloud and Cloudflare have highlighted the recurring issue of small, seemingly benign backend changes causing widespread disruptions. These incidents underscore a pattern where minor configuration updates, lacking runtime controls like feature flags or kill switches, propagate unexpectedly through complex systems, resulting in significant downtimes. Despite numerous postmortems and recommendations from industry leaders and frameworks, many engineering teams continue to prioritize deployment speed over reversibility, often overlooking the critical importance of runtime controls. This oversight persists even though such controls are essential for managing high-impact areas like authentication, data routing, and infrastructure upgrades, proving that operational excellence at scale necessitates treating all backend changes as potentially reversible and ensuring they are protected by robust runtime controls.