Extract text from documents and images with Datalab Marker and OCR

Post Details

Company

Replicate

Date Published

Oct. 21, 2025

Author

andreasjansson

Word Count

594

Company Posts That Month

3

Language

English

Hacker News Points

-

Source URL

replicate.com/blog/datalab-marker-and-ocr-fast-parsing

Summary

Datalab's advanced document parsing and text extraction models, Marker and OCR, are available on Replicate, offering state-of-the-art capabilities for converting various document formats, including PDFs and images, into markdown or JSON. Marker can process documents rapidly, transforming them into structured data while handling tables, math, and specific fields using a JSON Schema. OCR supports text recognition in ninety languages, providing reading order and table grids. Both models outperform established tools like Tesseract in speed and accuracy, with Marker excelling in structured extraction tasks as demonstrated by its superior performance on the olmOCR-Bench benchmark. Marker and OCR are accessible via code snippets on Replicate, with competitive pricing for different usage modes, making them versatile tools for efficient data extraction and document processing.

Trends Found in this Post

No tracked trend matches for this post yet.