How NVIDIA AI-Q Reached #1 on DeepResearch Bench I and II

Post Details

Company

Hugging Face

Date Published

March 12, 2026

Author

David Austin

Word Count

1,749

Company Posts That Month

63

Language

-

Hacker News Points

-

Post removed?

No

Source URL

huggingface.co/blog/nvidia/how-nvidia-won-deepresearch-bench

Summary

NVIDIA's AI-Q deep research agent recently achieved first place on both DeepResearch Bench I and II, the primary benchmarks for evaluating deep research agents, marking a significant advancement in open and portable deep research. AI-Q stands out due to its open blueprint for building AI agents that reason over enterprise and web data, delivering well-cited responses with a modular architecture that allows enterprises to own, inspect, customize, and configure the system per use case. The AI-Q deep researcher employs a multi-agent architecture consisting of a planner, researcher, and orchestrator, built on the NVIDIA NeMo Agent Toolkit and fine-tuned NVIDIA Nemotron 3 Super models, enhancing report quality through an optional ensemble and report refiner. Both benchmarks evaluate research agents differently, with Bench I focusing on report quality and Bench II on factual correctness and analytical rigor, and AI-Q's success on both indicates its ability to produce well-cited reports and accurately retrieve and synthesize information. The architecture's flexibility allows the use of different LLMs for each component, and custom middleware ensures reliability over long interactions. The core stack, which is open and reproducible, is powered by the NVIDIA NeMo Agent Toolkit and fine-tuned models, ensuring high-quality synthesis and citation-backed reporting through multi-step research and ensemble methods.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	13	6,078	960	218	+18%
Multi-agent systems	5	574	146	66	+51%
AI Agents	2	4,545	963	231	+27%
AI Model Fine-tuning	1	906	165	54	-16%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.