Company
Date Published
Author
Simon Hughes
Word count
2305
Language
English
Hacker News points
None

Summary

The Hughes Hallucination Evaluation Model (HHEM) has been launched by Vectara to compare hallucination rates across top Large Language Models (LLMs), including OpenAI, Cohere, PaLM, Anthropic's Claude 2, and more. The model uses a technique called Grounded Generation, also known as Retrieval Augmented Generation (RAG), which involves grounding the responses in an existing knowledge source to reduce hallucinations. The model was evaluated against various LLMs on a large dataset of documents and found that some models with lower answer rates were among the highest hallucinating models. The results show that the ability to correctly reject content is correlated with the ability to correctly provide a summary, and PaLM models exhibit significant differences in response length compared to other models. The model aims to help evaluate LLMs by hallucination rate and improve upon its own performance over time, with plans to integrate it into Vectara's platform and add additional leaderboards focused on measuring hallucinations in other RAG tasks.