Company
Date Published
Author
PJ Hyett
Word count
402
Language
English
Hacker News points
None

Summary

GitHub, originally a side project, experienced rapid growth that necessitated reevaluation of its infrastructure, specifically regarding the storage of repositories. Initially, repositories were stored in a straightforward directory structure, but as the user base expanded, this method became unsustainable due to the high volume of input/output operations. A pivotal change involved sharding repositories using an MD5 hash of the username, allowing for more efficient scaling and storage. This restructuring, informed by calculations from Tom Preston-Werner, ensures long-term viability even with significant user growth. While this solution wasn't implemented initially to avoid delaying the launch, the current setup supports seamless operations, freeing up resources to focus on developing new features for users, with a significant update anticipated soon.