Home / Companies / GitHub / Blog / Post Details
Content Deep Dive

Making open source data more available

Blog post from GitHub

Post Details
Company
Date Published
Author
Arfon Smith
Word Count
325
Company Posts That Month
19
Language
English
Hacker News Points
-
Summary

In collaboration with Google, a new and expansive dataset has been released on BigQuery, significantly enhancing the original GitHub Archive project from 2012. This dataset, now over 3TB, is the largest source of GitHub activity available, covering data from more than 2.8 million open source repositories, including over 145 million unique commits and the contents of 163 million files. It offers researchers, organizations, and developers the ability to search and analyze open source software activity and trends using regular expressions. This initiative aims to document the vast collection of human knowledge encoded in software, and future efforts will focus on making open source data more accessible and valuable for a variety of users. Interested parties can explore this dataset on Google Cloud to gain insights into open source communities and software development patterns.

Trends Found in this Post

No tracked trend matches for this post yet.