Reducto Raises Frontier Model Accuracy on GDP.pdf
Blog post from Reducto
Surge introduced GDP.pdf, a benchmark designed to assess the capability of advanced AI models to handle expert-level questions derived from real-world professional documents. The results showed that even the top models, like Claude Fable 5 and GPT-5.5, struggled with these tasks, achieving only 30% and 25% success rates, respectively. Reducto seeks to improve these outcomes by providing structured parsing of documents, which enhances model performance by reducing errors like misreading merged cells or omitting footnotes. In experimental tests, Reducto's parsing significantly improved the models' accuracy, with macro scores increasing by 9 percentage points, demonstrating about 40% more tasks being fully correct. The most substantial improvements were seen in areas requiring heavy reasoning, such as engineering and STEM documents. This parsing approach also reduces token usage, which is cost-effective and accelerates processing time. Reducto's solution involves a single API call per document, allowing for parsed outputs to be stored and reused, improving efficiency and accuracy in production environments.
No tracked trend matches for this post yet.