Mistral OCR vs. Gemini Flash 2.0: Comparing VLM OCR Accuracy
Blog post from Reducto
Mistral AI recently released a new OCR model touted as state-of-the-art on unreleased benchmarks, garnering significant online attention. However, testing by Reducto revealed inconsistencies between the model's reported performance and its actual output when compared to Gemini 2.0 Flash on various datasets. While Gemini handled document content effectively, Mistral exhibited significant errors and hallucinations, such as dropping important information and misclassifying layouts, leading to altered document interpretations. Reducto's evaluation using their RD-FormsBench dataset showed Mistral to be 43.4% less accurate than Gemini. Additionally, issues were noted with Mistral's tendency to mark large document sections as images, hindering accurate OCR data retrieval. The discrepancy in results may stem from Mistral's use of a non-public benchmarking dataset possibly similar to its training data, raising questions about the model's real-world applicability. Reducto plans to open-source their dataset to provide a more comprehensive reference for model evaluations, indicating that while document processing is not yet fully resolved, advancements in vision models show promising potential.