Building a RAG Batch Inference Pipeline with Anyscale and Union
Blog post from Anyscale
This blog showcases the versatility of Ray, an open-source unified compute framework, by demonstrating embedding generation and LLM batch inference with Ray in two Flyte pipelines. Flyte is an open-source orchestrator that facilitates building production-grade data and machine learning pipelines. The blog also highlights the importance of a unified distributed computation framework like Ray and a workflow orchestrator like Flyte for managing AI/ML workloads. Anyscale, built by the creators of Ray, provides a seamless user experience for developers to deploy AI/ML workloads at scale, while Union, built by the technical founding team behind Flyte, abstracts away the infrastructure, providing a turnkey system that lets ML engineers and data scientists focus on their tasks. The blog then dives into two Flyte pipelines: one for generating embeddings using Ray Data and saving them to cloud storage shared by Union and Anyscale; and another for monitoring GitHub issues in Flyte repositories and using the Anyscale Platform to serve an LLM with RAG to perform batch inference and reply to the GitHub issues.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Vector Search | 19 | 3,675 | 269 | 79 | +77% |
| LLM | 6 | 3,889 | 441 | 129 | +7% |
| RAG | 5 | 1,936 | 254 | 78 | -19% |
| Serverless | 1 | 647 | 170 | 80 | +31% |