Home / Companies / GitHub / Blog / Post Details
Content Deep Dive

Recent Load Balancer Problems

Blog post from GitHub

Post Details
Company
Date Published
Author
Jesse Newland
Word Count
744
Company Posts That Month
21
Language
English
Hacker News Points
-
Summary

In recent weeks, GitHub experienced service interruptions due to instability in its high-availability load balancer setup, primarily caused by excessive load on Xen servers, which led to repeated Heartbeat check timeouts and triggered unnecessary failover actions. To address this, GitHub reduced the frequency and increased the timeouts of Heartbeat checks, significantly lowering average load and variance across their Xen cluster, and eliminating false alerts. Additionally, they are implementing a dedicated high-availability pair for load balancers to isolate them from other virtual server failures and are working on improving load balancer configurations to reduce mean time to recovery. As part of ongoing infrastructure improvements, GitHub has also hired new system administrators to enhance stability and is committed to improving the overall user experience.

Trends Found in this Post

No tracked trend matches for this post yet.