AWS outage and why it again proves full-stack observability is non-negotiable

Post Details

Company

New Relic

Date Published

Oct. 21, 2025

Author

Jones Zachariah Noel N, Senior Developer Relations Engineer

Word Count

1,843

Language

English

Hacker News Points

-

Source URL

newrelic.com/blog/how-to-relic/aws-outage-why-o11y-is-non-negotiable

Summary

In October 2025, the AWS North Virginia (us-east-1) region experienced a significant outage lasting over 15 hours, affecting more than 140 AWS services due to a domino effect initiated by a DNS breakdown within the DynamoDB API endpoint. This incident disrupted major AWS clients like Snapchat, PayPal's Venmo, and Reddit, highlighting the critical interdependencies within the AWS ecosystem. Observability tools such as New Relic played a crucial role in identifying and mitigating the impacts of the outage by providing real-time visibility and faster incident detection. The financial consequences of such outages are substantial, with potential losses reaching millions per hour, emphasizing the importance of observability strategies and AI-assisted tools in reducing downtime and enhancing resilience. Despite the challenges, New Relic's platform maintained core functionalities due to its minimal reliance on the affected region, showcasing the importance of a robust disaster recovery plan and observability in managing service disruptions effectively.