AWS outage and why it again proves full-stack observability is non-negotiable
Blog post from New Relic
In October 2025, a major outage in AWS's North Virginia region severely disrupted over 140 services due to a DNS failure in the DynamoDB API, highlighting the vulnerability of interconnected systems within cloud architectures. This incident caused widespread issues, impacting major platforms like Snapchat, Venmo, and Reddit, and resulted in significant business costs, estimated at $2.2 million per hour of downtime. Observability tools like New Relic are emphasized as critical in mitigating such impacts by providing real-time visibility, faster detection, AI-assisted troubleshooting, and end-to-end tracing, thereby reducing the mean time to detect and resolve issues. The outage also underscored the importance of understanding service dependencies and maintaining a robust observability strategy to minimize the effects of future disruptions.