Home / Companies / AI21 Labs / Blog / Post Details
Content Deep Dive

Tipping the scales: Merging weak agents into a state-of-the-art deep researcher

Blog post from AI21 Labs

Post Details
Company
Date Published
Author
AI21 Labs
Word Count
1,464
Company Posts That Month
3
Language
English
Hacker News Points
-
Summary

DeepResearch Bench II (DRB II) is a benchmark that evaluates deep research agents against 9,430 expert-written rubrics across 132 tasks, emphasizing Information Recall as a key metric. Rather than focusing on creating a superior individual agent, the authors achieved a top leaderboard score of 64.38 by merging outputs from agents ranked 7th to 13th, none of which individually scored above 45. This approach capitalized on the diverse coverage of facts across multiple reports, enhancing Information Recall and demonstrating that combining existing agents can outperform refining a single one. The method involves agglomerative pairwise merging, where reports are fused iteratively to preserve factual information, thus improving overall task performance without developing a new agent. This strategy not only highlights the potential of leveraging existing resources but also suggests that as the number of available agents grows, the ability to extract more comprehensive insights from them will become increasingly significant.

Trends Found in this Post

No tracked trend matches for this post yet.