Company
Date Published
Author
Colin Sidoti
Word count
1511
Language
English
Hacker News points
None

Summary

Between September 14th and September 18th, 2025, a database incident severely affected customer traffic due to significant request failures and latency spikes, stemming from an automatic minor version upgrade of the Postgres database by the cloud provider. This upgrade removed a bottleneck in connection handling, which synchronized connection cycling and overwhelmed the system. The engineering team engaged in query optimization, traffic shaping, and a manual database upgrade to manage the issue, ultimately identifying the root cause and resolving it by adjusting the database connection pooling configuration. Despite the resolution, the incident highlighted challenges in diagnosing issues due to overlapping events and metrics resolution, prompting Clerk to plan further infrastructure improvements, such as evaluating database providers and enhancing service isolation. The company remains committed to improving reliability and regaining customer trust.