Patronus AI is an automated evaluation platform for large language models (LLMs) that enables engineers to score and benchmark LLM performance on real-world scenarios, generate adversarial test cases, monitor hallucinations, and detect sensitive information. The company has partnered with MongoDB Atlas to provide managed evaluation services, test suites, and adversarial data sets, helping customers verify the reliability of their RAG systems built on top of MongoDB Atlas. Patronus AI's platform has made a startling discovery that widely used state-of-the-art LLMs frequently hallucinate, incorrectly answering or refusing to answer up to 81% of financial analysts' questions. The company provides a 10-minute guide to help developers evaluate and improve the performance of their RAG systems, including exploring different indexes, modifying document chunking sizes, re-engineering prompts, and fine-tuning the embedding model itself.