
RAG Without OpenAI: BentoML, OctoAI and Milvus

What's this blog post about?

This tutorial demonstrates how to build retrieval augmented generation (RAG) applications using large language models (LLMs) without relying on OpenAI. The process involves serving embeddings with BentoML, inserting data into a vector database for RAG, setting up an LLM for RAG, and providing instructions to the LLM. Key components include BentoML for serving embeddings, OctoAI for accessing open-source models, and Milvus as the vector database. The example uses BentoML's Sentence Transformers Embeddings repository, a local Milvus instance using Docker Compose, and the Nous Hermes fine-tuned Mixtral model from OctoAI for RAG.


Date published
April 23, 2024

By Yujian Tang

Word count

Hacker News points
None found.


By Matt Makai. 2021-2024.