In July, GitHub.com experienced a significant service disruption due to a Kubernetes incident where production Pods were marked as unavailable, resulting in reduced capacity and service downtime. The issue stemmed from a container exceeding its memory limits, leading to its termination, compounded by a DNS maintenance operation that prevented Kubernetes from fetching new container images, causing Pods to fail to start. Efforts to mitigate the situation initially exacerbated the problem, but services were restored after utilizing cached DNS records. In response, GitHub plans to enhance monitoring, reduce dependency on the image registry, improve DNS change validation, reassess Kubernetes deployment policies, and develop a more incremental approach to deployments as part of a broader reliability initiative.