RedisLabs Incident

Post Details

Company

Rapid

Date Published

Sept. 14, 2020

Author

Iddo Gino

Word Count

576

Language

English

Hacker News Points

-

Source URL

rapidapi.com/blog/redislabs-incident

Summary

RapidAPI faced recurring server issues causing 500 or 429 errors for approximately 0.2% of API requests due to an overburdened Redis database cluster, which could not handle the increased network traffic resulting from a 50% surge in daily request volume. Originally, a single Redis cluster was sufficient, but as RapidAPI experienced significant growth, the networking between RapidAPI and the Redis database became the weakest link, with physical constraints on AWS leading to dropped packets at high request volumes. To address this, RapidAPI transitioned to a new Redis Cluster with multiple hosts and VPC peering, enabling horizontal scaling and eliminating single points of failure. This infrastructure upgrade not only resolves the immediate issues but also sets the stage for future enhancements, including deployment across multiple global data centers to further improve response times. The company acknowledged the collaborative efforts of the RapidAPI community in identifying and resolving the issue and expressed gratitude for their patience and support.