Building AI with MongoDB: How Patronus Automates LLM Evaluation to Boost Confidence in GenAI

Company

MongoDB

Date Published

Feb. 2, 2024

Author

Mat Keep

Word count

566

Language

English

Hacker News points

None

URL

www.mongodb.com/blog/post/how-patronus-automates-llm-evaluation-boost-confidence-genai

Summary

Patronus AI is an automated evaluation platform for large language models (LLMs) that enables engineers to score and benchmark LLM performance on real-world scenarios, generate adversarial test cases, monitor hallucinations, and detect sensitive information. The company has partnered with MongoDB Atlas to provide managed evaluation services, test suites, and adversarial data sets, helping customers verify the reliability of their RAG systems built on top of MongoDB Atlas. Patronus AI's platform has made a startling discovery that widely used state-of-the-art LLMs frequently hallucinate, incorrectly answering or refusing to answer up to 81% of financial analysts' questions. The company provides a 10-minute guide to help developers evaluate and improve the performance of their RAG systems, including exploring different indexes, modifying document chunking sizes, re-engineering prompts, and fine-tuning the embedding model itself.