Company
Date Published
Author
GitHub Engineering
Word count
1988
Language
English
Hacker News points
None

Summary

GitHub has revamped its approach to handling diffs, which are computationally intensive to generate and display, by developing a more efficient method utilizing the git-diff-tree command. Previously, GitHub imposed conservative limits on diff loading to prevent server overloads and browser unresponsiveness, but these often led to truncated diffs and frequent timeouts. The new strategy allows for a high-level overview of changes, progressively loading diffs to enhance user experience and reduce timeouts. This was achieved by running the old and new systems in parallel to ensure accuracy and improve performance, with particular challenges addressed around rename detection in diffs. The introduction of the git-diff-pairs command helped maintain rename associations, ensuring accurate diff text retrieval. Additionally, GitHub improved the accuracy of change statistics by using git-diff-tree --numstat --shortstat, ensuring comprehensive data collection even for partial diffs. By optimizing limits based on usage metrics, GitHub significantly improved the performance of diff pages, reducing the number of timeouts and enhancing overall site performance, marking a notable step forward in diff management and laying the groundwork for future optimizations.