Tipping the scales: Merging weak agents into a state-of-the-art deep researcher

Post Details

Company

AI21 Labs

Date Published

June 24, 2026

Author

AI21 Labs

Word Count

1,464

Company Posts That Month

3

Language

English

Hacker News Points

-

Source URL

www.ai21.com/blog/merging-weak-agents-into-a-state-of-the-art-deep-researcher

Summary

DeepResearch Bench II (DRB II) is a benchmark that evaluates deep research agents against 9,430 expert-written rubrics across 132 tasks, emphasizing Information Recall as a key metric. Rather than focusing on creating a superior individual agent, the authors achieved a top leaderboard score of 64.38 by merging outputs from agents ranked 7th to 13th, none of which individually scored above 45. This approach capitalized on the diverse coverage of facts across multiple reports, enhancing Information Recall and demonstrating that combining existing agents can outperform refining a single one. The method involves agglomerative pairwise merging, where reports are fused iteratively to preserve factual information, thus improving overall task performance without developing a new agent. This strategy not only highlights the potential of leveraging existing resources but also suggests that as the number of available agents grows, the ability to extract more comprehensive insights from them will become increasingly significant.

Trends Found in this Post

No tracked trend matches for this post yet.