The Cost of Overthinking: Why Reasoning Models Fail at Document Parsing
Blog post from LllamaIndex
In exploring the effectiveness of reasoning levels in document OCR, an evaluation using GPT-5.2 revealed that higher reasoning levels, such as 'xHigh,' do not necessarily improve accuracy in parsing complex documents, often leading to increased latency and cost without a corresponding boost in quality. The study assessed various document difficulties, including complex tables and mixed text orientations, showing that overthinking can lead to structural deviations from source documents, such as splitting continuous tables or hallucinating incorrect data. By contrast, a pipeline-based approach, exemplified by the LlamaParse Agentic parser, demonstrates that separating the reading of pixels from the structuring of text can achieve more accurate and efficient results, outperforming higher reasoning models in quality, speed, and cost. This approach allows for specialized components to handle different aspects of OCR, avoiding common pitfalls like hallucinations and resolution limits, and reserving reasoning for complex structural decisions rather than pixel-level interpretation.