Content Deep Dive
A History of HTML Parsing at Cloudflare: Part 2
Blog post from Cloudflare
Post Details
Company
Date Published
Author
Andrew Galloni, Ivan Nikulin
Word Count
3,142
Language
English
Hacker News Points
-
Source URL
Summary
In 2017, developers using the Cloudflare edge compute platform Workers wanted HTML rewriting capabilities similar to those used internally by Cloudflare. To meet this demand, a streaming HTML rewriter/parser with a CSS-selector based API was built in Rust and open-sourced as LOL HTML. The major change compared to the previous rewriter, LazyHTML, is the dual-parser architecture required to overcome the additional performance overhead of wrapping/unwrapping each token when propagating tokens to the Workers runtime. This new approach significantly speeds up parsing and reduces output latency and memory consumption.