Home / Companies / Context.dev / Blog / Post Details
Content Deep Dive

How We Fixed a ReDoS Vulnerability That Turned 15-Second Requests into 5-Minute Timeouts at Context.dev

Blog post from Context.dev

Post Details
Company
Date Published
Author
Yahia Bakour
Word Count
2,415
Language
English
Hacker News Points
-
Summary

Context.dev experienced significant performance issues due to a regex pattern causing catastrophic backtracking during HTML parsing, leading to timeouts and CPU overloads. Initially misdiagnosing the problem as a scaling issue, the team discovered that a single regex used for extracting Open Graph meta tags was consuming excessive CPU time, especially with malformed HTML containing long strings and missing quotes. This led to a realization about Regular Expression Denial of Service (ReDoS), where the regex engine's nested quantifiers caused exponential backtracking. To address the problem, Context.dev implemented temporary fixes like timeouts and circuit breakers, but ultimately transitioned to using Google's RE2 regex engine for its linear time complexity and abandoned regex for HTML parsing altogether, opting instead for Cheerio and streaming HTML parsers to handle large documents efficiently. This shift not only resolved their immediate performance woes but also highlighted the importance of using the right tools for HTML parsing, avoiding regex for such tasks, and implementing robust monitoring and testing practices to prevent similar issues in the future.