Dodging S3 Downtime With Nginx and HAProxy
We primarily use Amazon S3 as a data store for uploaded artifacts like JavaScript source maps and iOS debug symbols, which are critical in our event processing pipeline. Recently, we experienced an outage that lasted 3 hours, but the impact on our processing pipeline was minimal. To mitigate this issue, we set up an S3 proxy cache using Nginx and HAProxy to minimize risk to production-facing traffic, avoid single points of failure, prove the concept without increasing hardware costs, and avoid committing changes to application code. We used Nginx as a proxy server to leverage its `proxy_cache` feature, which stored all our S3 assets on disk when requested, while using HAProxy to route requests back to S3 if Nginx failed. Our new proposed infrastructure uses HAProxy to direct traffic to the cache server, which then proxies upstream to Amazon, allowing for a failover to occur without interruption. The configuration of HAProxy was quick and only required eight lines, while configuring Nginx involved more steps due to its heavy lifting in caching. The proxy service had been running for a week while we monitored our bandwidth and S3 bill, which dropped significantly. During the S3 outage, the proxy cache served all Sentry's S3 assets, allowing the event processing pipeline to continue flowing smoothly. Overall, implementing this proxy cache took less than a day, reducing our S3 bandwidth cost by 70% and gaining performance and reliability in our event processing pipeline.