Content Deep Dive
Data at GitHub
Blog post from GitHub
Post Details
Company
Date Published
Author
Brian Doll
Word Count
244
Language
English
Hacker News Points
-
Summary
GitHub Archive, a project initiated by Ilya Grigorik, records and archives GitHub's public timeline, making it accessible for analysis, and it is now available as a public dataset on Google BigQuery, where users can run queries for free. This development allows users to analyze GitHub's extensive data without the need to store it themselves. Additionally, a bonus dataset on BigQuery explores programming language correlations, providing insights such as the likelihood of a programmer who uses Objective-C also using Java, and observations about users of text editors like Emacs and Vim. These datasets enable a deeper understanding of programming trends and behaviors on GitHub.