Scaling monorepo maintenance
Blog post from GitHub
GitHub has successfully addressed the challenge of maintaining and repacking some of the largest and fastest-growing Git repositories by implementing a new strategy that allows for faster repacking and improved performance. Traditionally, GitHub's maintenance job involved repacking entire repositories into a single packfile, which was costly in terms of time and resources, especially for large repositories. To overcome this, GitHub developed solutions that include multi-pack indexes and multi-pack bitmaps, which allow for efficient object lookups across multiple packs and support reachability bitmaps beyond a single pack. This new approach involves a geometric repacking strategy that distributes objects across multiple packfiles, focusing on recently added objects, thereby optimizing repack times and reducing the frequency of full repository repacks. The changes lead to significant reductions in CPU time and repack duration, and the improvements are being contributed to the open-source Git project for future releases.