Test Database Fault Tolerance | CockroachDB AZ Failure Demo
Blog post from Cockroach Labs
CockroachDB's fault tolerance demo allows users to simulate an actual availability zone (AZ) failure within their database cluster, enabling observation of the system's ability to maintain availability, performance, and data consistency during disruptions. Available in Public Preview, this demo helps visualize database resilience by triggering a real AZ failure and guiding users through the process with live metrics and narration. Unlike traditional primary/secondary architectures, where failover is disruptive, CockroachDB's architecture distributes nodes across AZs, allowing uninterrupted service during a zone failure. The demo, which runs for 10-15 minutes, involves a temporary database and a TPC-C workload to measure baseline and disruption metrics. It includes blocking network communication to nodes in one AZ, demonstrating leader re-election and data rebalancing. After 10 minutes, the disruption ends automatically, providing a full failover and recovery summary. This tool is essential for testing database resilience in a controlled environment before production.