Home / Companies / Vapi / Blog / Post Details
Content Deep Dive

How We Achieved 99.99% Reliability At Vapi

Blog post from Vapi

Post Details
Company
Date Published
Author
Abhishek Sharma
Word Count
1,037
Language
English
Hacker News Points
-
Summary

Vapi undertook a comprehensive overhaul of its infrastructure to enhance uptime from 99.9% to 99.99%, focusing on minimizing downtime and improving resilience. This initiative included migrating their database from Supabase to Neon for better stability, implementing AWS Aurora for redundancy, and incorporating a caching layer that serves 80% of database requests to enhance speed and reliability. To address telephony issues, Vapi transitioned its SIP infrastructure to auto-scaling groups, eliminating bottlenecks and managing traffic spikes effectively. Their "fallbacks everywhere" philosophy ensures reliability by treating external providers as potential failure points and implementing automatic failover systems. Deployments were secured through a multi-cluster architecture with a Canary Manager that manages traffic and rolls back faulty updates automatically. Moreover, AWS Lambda burst workers were introduced to handle voice traffic spikes, utilizing a custom proxy for secure communication with the Kubernetes cluster. Vapi enhanced business logic reliability by integrating Temporal for durable execution of critical operations, and implemented process isolation, circuit breakers, and comprehensive monitoring to prevent failures. These improvements resulted in a 97% reduction in dropped calls, rapid failovers, and automated response to provider outages, establishing a robust foundation for trustworthy application development.