How we improved push processing on GitHub
Blog post from GitHub
When code is pushed to GitHub, a multitude of processes are triggered, such as pull request synchronization and push webhooks dispatch, highlighting GitHub's role as a dynamic platform for code management. Previously, a monolithic background job called RepositoryPushJob handled these tasks, which was problematic due to its complexity, difficulty in retrying tasks, and the tight coupling of concerns that created a significant risk of failure propagation. To address these issues, GitHub restructured this process by implementing a new architecture using Kafka events to decouple tasks into isolated, parallel processes, managed by different service owners. This transformation reduced dependencies, improved ownership, decreased latency, enhanced observability, and increased the reliability of push processing from 99.897% to 99.999%, ensuring that GitHub remains a robust platform for developers.