Company
Date Published
Author
The ClickHouse team
Word count
1011
Language
English
Hacker News points
None

Summary

The `git-import` tool is a distributed tool with ClickHouse that extracts information from Git repositories for analytics purposes. It generates three main tables: commits, file changes, and line changes, which can be loaded into ClickHouse or other DBMS. The generated data includes over 7 million rows of commits, 53 million rows of file changes, and 2.7 GB of line changes, providing a wealth of information for analysis. The tool proposes several questions for users to answer, including the reconstruction of the `git blame` command, which is particularly challenging due to ClickHouse's lack of arrayFold or arrayReduce functions. Users are encouraged to solve this query and others using the generated data, with a t-shirt prize offered for the first correct solution. The tool also provides some tips for users to keep in mind when working with the data, including checking out the same commit for comparison and considering file renames that affect change history.