Home / Companies / Gremlin / Blog / Post Details
Content Deep Dive

Reliability lessons from the 2025 Microsoft Azure Front Door outage

Blog post from Gremlin

Post Details
Company
Date Published
Author
Gavin Cahill
Word Count
1,387
Language
English
Hacker News Points
-
Summary

On October 29, 2025, a significant outage in Microsoft Azure Front Door impacted global services like Microsoft 365, Outlook, and Xbox Live, affecting companies such as Costco and Starbucks. The issue stemmed from a misconfiguration in Azure's data plane and content delivery network, taking seven hours for full recovery despite a rapid initial response. This incident underscores the importance of redundancy and failover systems, as well as the need for rigorous testing of dependencies using tools like Gremlin, which can simulate outages to verify system responsiveness. The outage highlights that customers hold businesses accountable for service disruptions, emphasizing the necessity for companies to ensure their systems are robust enough to handle such incidents. By mapping and testing dependencies, and understanding potential reliability risks, organizations can mitigate impacts and maintain service continuity, even when cloud providers face issues.