Company
Date Published
Author
Ravi Theja
Word count
1208
Language
English
Hacker News points
None

Summary

In the rapidly evolving field of large language models (LLMs), expanding context windows and enhancing models with retrieval capabilities are two prominent methods being explored to improve performance in tasks like generative question answering and summarization. A recent study by NVIDIA compared the effectiveness of these methods using powerful LLMs, such as GPT-43B and LLaMA2–70B, and found that combining retrieval augmentation with extended context windows significantly enhances performance. While NVIDIA's findings align with top-tier models and suggest further improvements with retrieval, other studies like one by Bai et al. (2023) show differing results, particularly regarding the benefits of retrieval in models with varying context window sizes. The study highlights that retrieval techniques can elevate the performance of models with both short and long context windows, with LLaMA2–70B-32k-ret outperforming notable models like GPT-3.5-turbo-16k. Public retrieval systems generally outperformed proprietary ones, and the optimal number of retrieved chunks was identified as five or ten, as retrieving more could degrade performance due to the "lost in the middle" phenomenon.