In February 2017, a significant outage in Amazon's Simple Storage Service (S3) in the US-EAST-1 region, caused by a simple typo during routine maintenance, resulted in a massive internet disruption affecting numerous high-profile companies. During the incident, Amazon struggled to communicate updates due to the failure of their usual channels, leading them to use Twitter for status updates. The outage underscored the importance of cross-region failover capabilities, proactive communication strategies, and routine testing for system reliability. Amazon's post-incident analysis revealed several areas for improvement, including updating tools to prevent similar errors, enhancing recovery processes, and decentralizing the AWS Service Health Dashboard. This event highlighted the need for organizations to implement redundancy across regions and test their systems' resilience against failures of third-party dependencies through chaos engineering experiments.