Home / Companies / Vercel / Blog / Post Details
Content Deep Dive

Preparing for the worst: Our core database failover test

Blog post from Vercel

Post Details
Company
Date Published
Author
Matheus Fernandes
Word Count
1,037
Language
English
Hacker News Points
-
Summary

In an effort to enhance operational resilience, an engineering team successfully conducted a full production failover of their core control-plane database from Azure West US to East US 2, with zero customer impact. This high-stakes exercise tested the entire control-plane traffic, including API requests and deployment operations, while ensuring production CDN traffic remained unaffected. The operation was motivated by previous datacenter outages and aimed to verify that their architecture could maintain uptime and serve production traffic seamlessly. Through rigorous preparation, including addressing issues with proprietary Cosmos DB clients and testing codepaths across multiple staging failovers, the team was well-prepared for the live exercise. The failover was executed efficiently, with minimal operational impact, validating system health through targeted and catch-all alerts. The team remains committed to refining their processes and infrastructure, acknowledging the crucial role of their partnership with Azure in achieving resilience and reliability.