AWS outage and why it again proves full-stack observability is non-negotiable

Post Details

Company

New Relic

Date Published

Oct. 21, 2025

Author

Jones Zachariah Noel N, Senior Developer Relations Engineer

Word Count

1,843

Language

English

Hacker News Points

-

Source URL

newrelic.com/blog/infrastructure-monitoring/aws-outage-why-o11y-is-non-negotiable

Summary

In October 2025, a major outage in AWS's North Virginia region severely disrupted over 140 services due to a DNS failure in the DynamoDB API, highlighting the vulnerability of interconnected systems within cloud architectures. This incident caused widespread issues, impacting major platforms like Snapchat, Venmo, and Reddit, and resulted in significant business costs, estimated at $2.2 million per hour of downtime. Observability tools like New Relic are emphasized as critical in mitigating such impacts by providing real-time visibility, faster detection, AI-assisted troubleshooting, and end-to-end tracing, thereby reducing the mean time to detect and resolve issues. The outage also underscored the importance of understanding service dependencies and maintaining a robust observability strategy to minimize the effects of future disruptions.