Scaling Semgrep rule coverage by spidering language documentation
Blog post from Semgrep
Semgrep has expanded its C# rule coverage within the .NET standard library, addressing various vulnerabilities such as XML External Entities, Cross-Site Request Forgery in ASP.NET, and SQL injection, among others. To further enhance this coverage, the security research team has utilized the Go-based Colly library to automate the extraction of significant advisories from Microsoft's .NET documentation, which is known for its consistency and comprehensive nature. By identifying warning boxes through a Colly spider, the team gathered around 60 documentation pages to investigate, focusing primarily on correctness-related issues, such as the limitations of the Double.Epsilon property in floating-point equality. The results underscore the potential of using automated tools to mine well-structured documentation for security insights, with future enhancements possibly including fuzzy matching and documentation sentiment analysis to handle inconsistencies and identify cautionary language.