Company
Date Published
Author
Ably
Word count
1524
Language
-
Hacker News points
None

Summary

An AWS outage on October 20, 2025, in the us-east-1 region highlighted the vulnerability of many services reliant on cloud providers, posing significant challenges for businesses aiming for resilience against regional failures. Ably, a platform designed for 100% uptime, demonstrated its robust architecture during this incident by employing multiple strategies for fault tolerance. These strategies include operating across multiple geographic regions with truly independent data centers, using active-active configurations for seamless traffic rerouting, and integrating multiple underlying service providers to avoid single points of failure. Ably also emphasizes continuous monitoring and smart client-side SDKs, which automatically detect and reroute around failures, ensuring users experience minimal impact during disruptions. The company's approach prioritizes designing systems to be resilient to infrastructure failures, rather than merely relying on service level agreements (SLAs) for compensation post-outage, proving reliability through infrastructure that users do not notice even in the event of failures.