Company
Date Published
Author
Gavin Cahill
Word count
943
Language
English
Hacker News points
None

Summary

A major apparel company faced the challenge of testing the failover capability of their AWS Lambda-based payment application between regions without access to underlying infrastructure, a common issue with serverless models. They employed Failure Flags, a tool designed for application-level testing in managed environments, which allowed them to simulate a regional outage and test the failover process within 30 minutes. This test not only confirmed the system's resilience, potentially saving millions in sales, but also revealed areas for performance improvement. Encouraged by these results, the company plans to expand the use of Gremlin's reliability tests across all deployment environments, including Lambda, EKS, and ECS, to further enhance system resilience and performance.