Company
Date Published
Author
David Tuber Emily Music Bryton Herdes
Word count
1351
Language
English
Hacker News points
None

Summary

On August 21, 2025, a significant surge in traffic from a single customer led to severe congestion on the network links between Cloudflare and Amazon Web Services (AWS) us-east-1, affecting users with latency and packet loss issues. The incident, which was not an attack, caused disruptions primarily due to insufficient capacity in Cloudflare's edge routers and a pre-existing failure in one of the direct peering links. AWS's attempts to alleviate congestion by withdrawing some BGP advertisements inadvertently worsened the situation by rerouting traffic to already strained links. The problem was mitigated by rate limiting the customer's traffic and through coordinated efforts between Cloudflare and AWS to restore normal operations. In response, Cloudflare is implementing a multi-phased strategy to prevent such events in the future, including enhancing network capacity and developing a traffic management system to ensure that no single customer's usage can impact the entire network.