Content Deep Dive
January 25th Outage
Blog post from Ona
Post Details
Company
Date Published
Author
Pavel Tumik
Word Count
1,134
Language
English
Hacker News Points
-
Source URL
Summary
On January 25th, Gitpod experienced a global outage lasting over an hour due to a DNS failure within its cluster. This resulted in workspaces not being able to start and existing ones experiencing data loss. The team immediately started investigating the issue and spun up new clusters as a precautionary measure. They discovered that traffic was unable to reach Google's primary DNS server (8.8.8.8) on port 53 UDP, causing timeout errors. After diverting traffic to new clusters, Gitpod resumed normal operations. The team is now focusing on improving data backup procedures and enhancing DNS resilience by using multiple DNS servers. As an apology for the outage, Gitpod is issuing credits to its customers.