Introducing /parse: Turn any document into LLM-ready data
Blog post from Firecrawl
Firecrawl has introduced a new feature called /parse, which allows users to upload local files and receive clean, structured outputs similar to those obtained from web pages. This feature supports various file formats such as PDF, DOCX, DOC, ODT, RTF, XLSX, XLS, and HTML, with a size limit of 50 MB per file. Powered by a Rust-based engine, /parse offers fast processing by classifying pages and utilizing GPU resources only when necessary, thus ensuring efficient extraction of text while preserving the layout, tables, and reading order in documents. Users can request outputs in markdown or structured JSON format, with options for additional features like summaries and structured extraction based on a provided JSON schema. This integration facilitates seamless document processing for web and local files, enhancing data extraction capabilities for enterprises while maintaining data security through features like Zero Data Retention.