On February 6, 2025, a database query error caused downtime for 3,700 customers and nearly overloaded the infrastructure due to aggressive automatic retries, lasting 26 minutes before resolution. The incident prompted Clerk, a provider of mission-critical infrastructure, to undertake a detailed review and implement several remediations to prevent recurrence. Immediate actions included tuning retry mechanisms, restricting direct database access, mandating staged rollouts for critical infrastructure changes, and improving SDK resilience to minimize service dependency during outages. The company also plans to decouple session management from user management to prevent cascading failures and eliminate the use of JSON column types for structured data to enforce strict typing. Clerk emphasized their commitment to transparency, reliability, and continuous improvement by prioritizing these changes to fortify their platform against future incidents.