Divide, conquer, and plan: How weaker models beat GPT-4o on long context tasks
Blog post from Together AI
The research paper "When Does Divide and Conquer Work for Long Context LLM?" explores a novel framework that uses a "Divide & Conquer" approach to enhance the performance of smaller language models on long-context tasks, potentially surpassing the capabilities of larger models like GPT-4o in single-shot scenarios. The study reveals that as context length increases, models experience superlinear growth in confusion, termed "Model Noise," while "Task Noise" arises from dependencies across text chunks, and "Aggregator Noise" affects the integration of partial answers. By strategically dividing tasks into manageable chunks and employing smaller models to handle them in parallel, the framework offers benefits such as reduced costs, faster processing, and easier tuning, proving effective in tasks like retrieval, QA, and summarization, although not universally applicable, especially in cases where significant cross-chunk dependencies exist.