What is Retrieval Augmented Generation?
Blog post from Roboflow
Retrieval Augmented Generation (RAG) is a technique designed to enhance the capabilities of Large Language Models (LLMs) and Large Multimodal Models (LMMs) by providing them with contextually relevant information retrieved from databases, which can include both text and images. This method addresses a fundamental limitation of LLMs, which are expensive to train and often rely on outdated information due to infrequent retraining. By utilizing a vector database, RAG allows for semantic search to find documents or images related to a query, which can then be integrated into model prompts to offer more precise and relevant responses. Initially used with text, RAG has expanded its utility to computer vision, enabling the creation of systems for defect detection, logo recognition, and few-shot labeling by leveraging the ability to provide visual references in a model's query. This approach significantly improves the performance and adaptability of multimodal models like GPT-4V by allowing them to incorporate up-to-date and specific contextual information into their analysis.