Primer: Multi-Modal RAG vs Text-Only RAG

Post Details

Company

LllamaIndex

Date Published

Nov. 16, 2023

Author

LlamaIndex

Word Count

1,151

Language

English

Hacker News Points

-

Source URL

www.llamaindex.ai/blog/evaluating-multi-modal-retrieval-augmented-generation-db3ca824d428

Summary

The blog post discusses the evaluation of Multi-Modal Retrieval-Augmented Generation (RAG) systems, building upon traditional text-only RAG evaluation methods and adapting them to accommodate multiple modalities, such as images. It emphasizes the need for separate evaluation of retrieval and generation stages, using metrics like relevancy and faithfulness, which now must consider both text and visual contexts. The evaluation approach involves utilizing Large Multi-Modal Models (LMMs) as judges, a method termed LMM-As-A-Judge, to ensure the generated responses align with the multi-modal contexts. The post acknowledges potential issues with LMM judges, such as hallucinations, and stresses the importance of careful use in production environments. It also highlights the importance of evaluating other dimensions like alignment and safety, providing links to practical guides and documentation for further exploration of building and evaluating Multi-Modal RAG systems.