How NVIDIA AI-Q Reached #1 on DeepResearch Bench I and II
Blog post from HuggingFace
NVIDIA's AI-Q deep research agent recently achieved first place on both DeepResearch Bench I and II, the primary benchmarks for evaluating deep research agents, marking a significant advancement in open and portable deep research. AI-Q stands out due to its open blueprint for building AI agents that reason over enterprise and web data, delivering well-cited responses with a modular architecture that allows enterprises to own, inspect, customize, and configure the system per use case. The AI-Q deep researcher employs a multi-agent architecture consisting of a planner, researcher, and orchestrator, built on the NVIDIA NeMo Agent Toolkit and fine-tuned NVIDIA Nemotron 3 Super models, enhancing report quality through an optional ensemble and report refiner. Both benchmarks evaluate research agents differently, with Bench I focusing on report quality and Bench II on factual correctness and analytical rigor, and AI-Q's success on both indicates its ability to produce well-cited reports and accurately retrieve and synthesize information. The architecture's flexibility allows the use of different LLMs for each component, and custom middleware ensures reliability over long interactions. The core stack, which is open and reproducible, is powered by the NVIDIA NeMo Agent Toolkit and fine-tuned models, ensuring high-quality synthesis and citation-backed reporting through multi-step research and ensemble methods.