Company
Date Published
Author
Michelle Gienow
Word count
1061
Language
English
Hacker News points
None

Summary

CockroachDB: The Definitive Guide emphasizes the importance of architecting applications for scalability, resilience, and low-latency performance, especially in the face of disasters such as fires and cloud provider outages. At RoachFest23, Thomas Boltze from Santander highlighted that human error is often the primary cause of system failures, indicating the need for robust resiliency practices. He shared insights from Santander's journey to achieving a resilient payments system by continuously testing, identifying, and addressing failures, which eventually enabled the system to withstand data center outages and process payments without interruption. The key to their success lay in a culture shift towards curiosity, shared responsibility, and automation, resulting in a system designed to handle multi-region and multi-cloud failures, ensuring uninterrupted service even during significant outages.