Home / Companies / Zilliz / Blog / Post Details
Content Deep Dive

Building RAG with Milvus, vLLM, and Llama 3.1

Blog post from Zilliz

Post Details
Company
Date Published
Author
Christy Bergman
Word Count
1,673
Language
English
Hacker News Points
-
Summary

The University of California – Berkeley has donated vLLM, a fast and easy-to-use library for LLM inference and serving, to LF AI & Data Foundation as an incubation-stage project. Large Language Models (LLMs) and vector databases are usually paired to build Retrieval Augmented Generation (RAG), a popular AI application architecture to address AI Hallucinations. This blog demonstrates how to build and run a RAG with Milvus, vLLM, and Llama 3.1.1. The process includes embedding and storing text information as vector embeddings in Milvus, using this vector store as a knowledge base to efficiently retrieve text chunks relevant to user questions, and leveraging vLLM to serve Meta's Llama 3.1-8B model to generate answers augmented by the retrieved text.