Home / Companies / GitHub / Blog / Post Details
Content Deep Dive

Code scanning and Ruby: turning source code into a queryable database

Blog post from GitHub

Post Details
Company
Date Published
Author
Nick Rolfe
Word Count
2,844
Language
English
Hacker News Points
-
Summary

GitHub has introduced beta support for Ruby in its CodeQL engine, enhancing its code scanning capabilities to aid developers in creating secure code, with particular relevance due to GitHub's own use of Ruby on Rails. CodeQL operates by executing queries on a database representation of a program, and to support a new language, an extractor is needed to parse the source code into a relational form. For Ruby, GitHub employs tree-sitter, a parser framework known for its speed and error recovery, which has allowed the development of a schema-generator that automatically translates tree-sitter’s grammar descriptions into a CodeQL database schema. This approach simplifies the database creation process, making it language-agnostic and enabling the support of additional languages through tree-sitter's existing parsers. The Ruby extractor was tested on GitHub's large Ruby on Rails application, demonstrating the extractor's efficiency and the potential for future expansion to other languages. Multi-threaded extraction in Rust has significantly improved performance, making the process faster and more scalable, and the tool is now available for public beta testing to assist developers in analyzing Ruby projects.