HHEM v2: A New and Improved Factual Consistency Scoring Model

Post Details

Company

Vectara

Date Published

April 16, 2024

Author

Forrest Bao, Miaoran Li and Rogger Luo

Word Count

1,765

Language

English

Hacker News Points

-

Source URL

www.vectara.com/blog/hhem-v2-a-new-and-improved-factual-consistency-scoring-model

Summary

Hallucinations in generative AI, particularly in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems, present a significant challenge by producing outputs not grounded in the input data, thereby affecting the reliability of these technologies. Traditional methods of detecting these discrepancies, such as employing LLM judges, are often costly, slow, and inaccurate. Vectara's open-source Hughes Hallucination Evaluation Model (HHEM), which has been widely downloaded, offers a solution by providing a factual consistency score that is both efficient and multilingual, supporting languages like English, German, and French. HHEM v2, an improved version, offers calibrated scores with probabilistic meanings, ensuring more accurate detection of factual inconsistencies while maintaining low latency, making it more efficient than larger models like GPT-3.5. Despite being tested against established benchmarks like AggreFact and RAGTruth, which highlight the challenges of accurately detecting hallucinations, HHEM v2 stands out for its superior performance in assessing factual consistency, thereby enhancing trust in generative AI outputs and offering a practical tool for enterprises seeking reliable AI solutions.