Home / Companies / GitHub / Blog / Post Details
Content Deep Dive

Data at GitHub

Blog post from GitHub

Post Details
Company
Date Published
Author
Brian Doll
Word Count
244
Language
English
Hacker News Points
-
Summary

GitHub Archive, a project initiated by Ilya Grigorik, records and archives GitHub's public timeline, making it accessible for analysis, and it is now available as a public dataset on Google BigQuery, where users can run queries for free. This development allows users to analyze GitHub's extensive data without the need to store it themselves. Additionally, a bonus dataset on BigQuery explores programming language correlations, providing insights such as the likelihood of a programmer who uses Objective-C also using Java, and observations about users of text editors like Emacs and Vim. These datasets enable a deeper understanding of programming trends and behaviors on GitHub.