Cut the Bull…. Detecting Hallucinations in Large Language Models

Company

Vectara

Date Published

Nov. 6, 2023

Author

Simon Hughes

Word count

2305

Language

English

Hacker News points

None

URL

vectara.com/blog/cut-the-bull-detecting-hallucinations-in-large-language-models

Summary

The Hughes Hallucination Evaluation Model (HHEM) has been launched by Vectara to compare hallucination rates across top Large Language Models (LLMs), including OpenAI, Cohere, PaLM, Anthropic's Claude 2, and more. The model uses a technique called Grounded Generation, also known as Retrieval Augmented Generation (RAG), which involves grounding the responses in an existing knowledge source to reduce hallucinations. The model was evaluated against various LLMs on a large dataset of documents and found that some models with lower answer rates were among the highest hallucinating models. The results show that the ability to correctly reject content is correlated with the ability to correctly provide a summary, and PaLM models exhibit significant differences in response length compared to other models. The model aims to help evaluate LLMs by hallucination rate and improve upon its own performance over time, with plans to integrate it into Vectara's platform and add additional leaderboards focused on measuring hallucinations in other RAG tasks.