OmniDocBench is Saturated, What’s Next for OCR Benchmarks?
Blog post from LllamaIndex
Recent advancements in document Optical Character Recognition (OCR) Vision and Language Models (VLMs) have been notable, particularly with the introduction of models like GLM-OCR, which achieved a state-of-the-art (SOTA) benchmark on OmniDocBench v1.5 with 94.6% accuracy. OmniDocBench has become a standard for evaluating document understanding models, but it faces criticism for its limited dataset types and rigid evaluation metrics, which can penalize models for minor, semantically irrelevant differences. Although highly accurate models like GLM-OCR are emerging, the challenge of document parsing persists due to the complexity and variability of real-world documents. Current benchmarks often emphasize exact matches over semantic correctness, which can hinder innovation in developing models capable of handling complex documents. There is a call for a more comprehensive and flexible benchmark that focuses on semantic accuracy to better reflect the capabilities needed for advanced document parsing in diverse contexts.