The Arize team has created the largest public dataset of hallucinations and fine-tuned evaluation models to address the high cost of running Large Language Model (LLM) evaluations at scale. They launched LibreEval, an open-source project focused on evaluating hallucinations more accurately and affordably, which consists of a massive open-labeled hallucination dataset and fine-tuned models for hallucination detection. The dataset includes 70K examples designed to evaluate RAG systems on context adherence and has multilingual coverage, synthetic + real-world hallucinations, and consensus labeling. The fine-tuned models are compact, cost-efficient, and highly performant, with inference costs roughly 10x cheaper than using GPT-4. LibreEval offers a path forward for scalable and trustworthy LLM monitoring by combining open data, fine-tuned small models, and continuous feedback loops.