Home / Companies / Replicate / Blog / Post Details
Content Deep Dive

Status page

Blog post from Replicate

Post Details
Company
Date Published
Author
nickstenning
Word Count
1,385
Language
English
Hacker News Points
-
Summary

Replicate experienced a significant outage on May 11, 2023, affecting both its website and API, primarily due to issues with its PostgreSQL database, which is central to its platform. The outage was triggered by a newly implemented asynchronous update feature that inadvertently caused simultaneous INSERT queries for the same prediction ID, leading to increased query latencies and eventual exhaustion of the database connection pool. Despite initial assumptions that the previous day's database resizing might have been the cause, the real issue stemmed from the asynchronous updates creating a dangerous query pattern. As a result, the replicate.com website faced intermittent failures, although customers using webhooks to receive updates were less affected. The company responded by disabling the problematic feature, which restored normal operations, and is now focused on redesigning the feature to avoid similar issues in the future. This experience has provided valuable insights into the system's constraints, prompting a review of potential database hotspots and a determination to improve resilience against database disruptions.