Company
Date Published
Author
Matthew Gilliard
Word count
1654
Language
English
Hacker News points
None

Summary

jsoup` is a Java library used for parsing HTML and XML documents. It's designed to work with the flexibility and style of web browsers, which means it can handle malformed or "tag soup" HTML more effectively than traditional XML parsers. The author demonstrates how to use `jsoup` to fetch a webpage, extract data using CSS selectors, clean up malicious HTML, and prevent cross-site scripting (XSS) attacks. The library is easy to add to any Java project with no additional dependencies, making it a useful tool for web scraping and data extraction tasks.