Company
Date Published
Author
Quentin Macé, Antonio Loison, Antoine EDY, Victor Xing, and Gautier Viaud
Word count
2524
Language
-
Hacker News points
None

Summary

ILLUIN Technology, collaborating with NVIDIA, has introduced ViDoRe V3, a comprehensive benchmark for evaluating enterprise document retrieval systems. This benchmark is designed to address challenges in retrieving accurate information from complex, visually-rich documents in real-world scenarios, focusing on multi-modal, enterprise relevance and high data quality. Unlike previous benchmarks, ViDoRe V3 includes human-created and verified annotations and draws from 10 diverse datasets across different industrial domains, with 8 publicly available and 2 kept private. It features 26,000 pages and 3,099 queries translated into six languages, each linked to human-verified retrieval ground truth data. ViDoRe V3 seeks to improve on synthetic data reliance by utilizing a combination of advanced visual language models and human expertise, offering a robust assessment of current retrieval models, which still struggle with multilingual and technical documents, particularly in domains like industrial and energy-related content. The benchmark aims to provide a more realistic and challenging evaluation framework, pushing the boundaries of visual retrieval systems and emphasizing the need for models to synthesize information from multiple pages to address complex queries effectively.