Code scanning and Ruby: turning source code into a queryable database
Blog post from GitHub
GitHub has introduced beta support for Ruby in its CodeQL engine, enhancing its code scanning capabilities to aid developers in creating secure code, with particular relevance due to GitHub's own use of Ruby on Rails. CodeQL operates by executing queries on a database representation of a program, and to support a new language, an extractor is needed to parse the source code into a relational form. For Ruby, GitHub employs tree-sitter, a parser framework known for its speed and error recovery, which has allowed the development of a schema-generator that automatically translates tree-sitter’s grammar descriptions into a CodeQL database schema. This approach simplifies the database creation process, making it language-agnostic and enabling the support of additional languages through tree-sitter's existing parsers. The Ruby extractor was tested on GitHub's large Ruby on Rails application, demonstrating the extractor's efficiency and the potential for future expansion to other languages. Multi-threaded extraction in Rust has significantly improved performance, making the process faster and more scalable, and the tool is now available for public beta testing to assist developers in analyzing Ruby projects.