Finding Python ReDoS bugs at scale using Dlint and r2c
Blog post from Semgrep
Regular expression denial-of-service (ReDoS) occurs when a specially crafted input string causes inefficient regular expressions to take an excessively long time to process, leading to potential denial-of-service attacks. Key examples include nested quantifiers and mutually inclusive alternations, which can result in catastrophic backtracking, impacting application security and availability. To address these vulnerabilities, Python static analysis tools, such as Dlint, have been developed to detect inefficient regex patterns that could lead to ReDoS. Dlint has been integrated into r2c's distributed analysis platform, allowing for large-scale detection of such bugs across numerous open-source Python projects. The platform's findings include both true positives, which can improve project resilience, and false positives, which help refine detection algorithms. A notable discovery was a ReDoS vulnerability in the urllib.request module related to HTTP authentication, leading to the identification of Python bpo-39503 and CVE-2020-8492. These tools and methodologies facilitate the identification and mitigation of ReDoS vulnerabilities, contributing to more secure and reliable software.