Huggingface Transformers has integrated the Retrieval Augmented Generation (RAG) model with Ray, a library for building scalable applications, to improve the scalability of RAG distributed fine-tuning. This integration speeds up retrieval calls by 2x and improves the overall performance of RAG on knowledge-intensive tasks. The new implementation uses Ray's stateful actor abstractions to load the index and handle retrieval queries, overcoming limitations of previous implementations. With this integration, users can leverage RAG for retrieval-based generation on their own knowledge-intensive tasks, and also take advantage of hyperparameter tuning with Ray Tune library.