Embedding Inference at Scale for RAG Applications with Ray Data and Milvus

Company

Zilliz

Date Published

April 12, 2024

Author

By Christy Bergman, and Cheng Su

Word count

1761

Language

English

Hacker News points

None

URL

zilliz.com/blog/embedding-inference-at-scale-for-RAG-app-with-ray-data-and-milvus

Summary

This blog discusses the use of Retrieval Augmented Generation (RAG) applications with open-source tools such as Ray Data and Milvus. The author highlights the performance boost achieved using Ray Data during the embedding step, where data is transformed into vectors. By using just four workers on a Mac M2 laptop with 16GB RAM, Ray Data was found to be 60 times faster than Pandas. The blog also presents an open-source RAG stack that includes BGM-M3 embedding model, Ray Data for fast, distributed embedding inference, and Milvus or Zilliz Cloud vector database. The author provides a step-by-step guide on how to set up these tools and use them to generate embeddings from data downloaded from Kaggle IMDB poster. Additionally, the blog discusses the benefits of using bulk import features in Milvus and Zilliz Cloud for efficient batch loading of vector data into a vector database.