RAG is Dead! Long Live RAG!
Blog post from Vectorize
Google's Gemini 1.5 model introduces a significant advancement in AI capabilities by supporting context windows of up to 1 million tokens, a substantial increase from existing models like GPT-4 Turbo and Claude 2.1. Despite this breakthrough in handling vast amounts of data, there are concerns about its practical implications. Tests reveal that while Gemini 1.5 excels in recalling information within its extended context, real-world applications show a recall rate of around 60%, meaning a significant portion of context may still be "lost." Additionally, the model faces challenges such as high latency, cost concerns, and limited tuning options, which complicate its integration into AI applications. Consequently, traditional retrieval-augmented generation (RAG) pipelines remain necessary to ensure the efficient processing and retrieval of relevant data, as Gemini 1.5 does not entirely eliminate the need for data engineering and retrieval strategies.