Home / Companies / GitHub / Blog / Post Details
Content Deep Dive

GitHub Availability Report: November 2023

Blog post from GitHub

Post Details
Company
Date Published
Author
Jakub Oleksy
Word Count
265
Language
English
Hacker News Points
-
Summary

In November, GitHub experienced a 38-minute incident caused by a memory leak in the authorization microservice, leading to degraded performance across its services. This issue arose on November 3 due to excessive application memory use under high traffic, which was not detected during testing, causing failed authorization requests and resulting in 404 or error responses for users. The problem began at 18:42 UTC when pods crashed repeatedly, and alerts were triggered shortly after. Although there was a delay in rolling back the change due to dependencies in the deployment infrastructure, the rollback was completed by 19:08 UTC, restoring all affected GitHub features. To prevent similar incidents, GitHub has revised its rollout strategy by enhancing monitoring and checks and removing dependencies that hinder rollback processes. The company encourages users to check their status page for updates and the GitHub Engineering Blog for further insights into their ongoing work.