Building RAG with Milvus, vLLM, and Llama 3.1

Post Details

Company

Zilliz

Date Published

Aug. 4, 2024

Author

Christy Bergman

Word Count

1,673

Language

English

Hacker News Points

-

Source URL

zilliz.com/blog/building-rag-milvus-vllm-llama-3-1

Summary

The University of California – Berkeley has donated vLLM, a fast and easy-to-use library for LLM inference and serving, to LF AI & Data Foundation as an incubation-stage project. Large Language Models (LLMs) and vector databases are usually paired to build Retrieval Augmented Generation (RAG), a popular AI application architecture to address AI Hallucinations. This blog demonstrates how to build and run a RAG with Milvus, vLLM, and Llama 3.1.1. The process includes embedding and storing text information as vector embeddings in Milvus, using this vector store as a knowledge base to efficiently retrieve text chunks relevant to user questions, and leveraging vLLM to serve Meta's Llama 3.1-8B model to generate answers augmented by the retrieved text.