A framework and leaderboard for Retrieval Pipelines evaluation on ViDoRe v3
Blog post from HuggingFace
The blog post discusses the development and utility of the ViDoRe v3 framework and leaderboard, which aims to evaluate retrieval pipelines, particularly in the context of Retrieval Augmented Generation (RAG). RAG enhances Large Language Models (LLMs) by integrating a retrieval component that injects relevant context into prompts, and ViDoRe v3 serves as a benchmark for assessing the performance of embedding models in visual retrieval tasks. The post highlights key components of retrieval pipelines, such as Optical Character Recognition (OCR), Vision-Language Models (VLMs), and algorithms like Sparse Search, Dense Embedding Models, and Late Interaction models. It emphasizes the importance of choosing the right components for specific business and system requirements to build state-of-the-art retrieval systems. The ViDoRe v3 Pipeline Leaderboard, available on Hugging Face, facilitates the comparison of different pipeline implementations by showcasing their average accuracy and search latency. The blog underscores the transition from static pipelines to dynamic Retrieval Agents, which can adaptively enhance search accuracy by rewriting queries or utilizing various tools. ViDoRe v3 provides a standardized framework for evaluating diverse retrieval pipelines, allowing for comparisons between dense, sparse, and hybrid retrieval approaches, as well as between text-based and image-based retrieval methods.