The release of Llama 4 Scout with its 10 million token context window has sparked debate on whether retrieval-augmented generation (RAG) is obsolete, but the article argues that RAG remains relevant, especially for applications needing access to extensive data at scale. While long context windows allow large documents to be processed in one go, they introduce latency, cost, and accuracy issues, making RAG a valuable tool for targeted context. For use cases like interacting with large knowledge bases, RAG is essential since the data volume exceeds the capacity of even the most advanced context windows. Additionally, RAG is more cost-effective and often faster, with the retrieval step typically completed within milliseconds. RAG also helps in providing more accurate outputs by ensuring only relevant data is processed, reducing the risk of hallucinations. As context window sizes continue to grow, RAG is expected to complement them by addressing their limitations, rather than being replaced entirely.