Beyond Semantic Similarity: Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline
Blog post from HuggingFace
NVIDIA's NeMo Retriever team has developed an innovative agentic retrieval pipeline that has achieved top rankings on the ViDoRe v3 and BRIGHT leaderboards, showcasing its generalizability across diverse retrieval tasks. Unlike traditional dense retrieval methods that rely on semantic similarity, this pipeline employs a ReACT architecture allowing for dynamic search and reasoning strategies, adapting to different datasets without architectural changes. The agentic retrieval method bridges the gap between large language models (LLMs) and traditional retrievers by creating an iterative loop that improves query generation, rephrasing, and breaking down complex queries. Despite being resource-intensive, the pipeline's efficiency was enhanced by replacing the Model Context Protocol server with a thread-safe singleton retriever, improving GPU utilization and throughput. Ablation studies demonstrate the benefits of using specialized embeddings and highlight the potential for agentic retrieval to reduce performance gaps between stronger and weaker models. While the approach is slower and more costly than standard methods, it holds promise for complex, high-stakes queries, with ongoing efforts to reduce costs and improve efficiency through smaller, specialized models.