How We Handle Technical Incidents and Service Disruptions
Blog post from Tinybird
Tinybird experienced a significant service disruption affecting a major customer due to a misconfiguration during an infrastructure update on December 19th. Despite rigorous testing in other environments, a critical oversight occurred when data path changes were not applied to the customer's production environment, leading to database queries failing as the servers used incorrect directories. The incident, which lasted 28 minutes, was resolved quickly by Tinybird's team, who immediately informed the affected customer and provided an incident report detailing the root cause and corrective measures. These measures included improving load balancer health checks and contributing a bug fix to ClickHouse®. While service disruptions are challenging and costly, Tinybird views them as opportunities for learning and system improvements.