Introducing RolmOCR: A Faster, Lighter Open Source Document Model Built on olmOCR
Blog post from Reducto
Earlier this year, the Allen Institute for AI released olmOCR, an open-source OCR model for parsing complex documents, which has now been succeeded by RolmOCR, a faster, memory-efficient alternative that maintains robust performance across various document types. RolmOCR, built on the updated Qwen2.5-VL-7B model, omits the use of metadata, reducing prompt length and resource consumption without significantly impacting accuracy in most cases, though it may perform less effectively in scenarios where metadata provides essential context. Trained on the same dataset as olmOCR but incorporating rotated data to improve robustness, RolmOCR demonstrates either improved or equivalent performance in OCR tasks, such as better character recognition in handwritten notes and more accurate information extraction from low-contrast images, although it may sometimes miss structured elements like subtitles in the absence of metadata. Released under the Apache 2.0 license, RolmOCR is available for open-source exploration and development, with the potential for further enhancements tailored to specific needs, and feedback or comparisons with other models are welcomed by the developers.