Home / Companies / Vectara / Blog / Post Details
Content Deep Dive

Why does Deepseek-R1 hallucinate so much?

Blog post from Vectara

Post Details
Company
Date Published
Author
Chenyu Xu and Ofer Mendelevitch
Word Count
1,501
Language
English
Hacker News Points
-
Summary

DeepSeek R1, a recent language model, exhibits a significantly higher hallucination rate of 14.3% compared to its predecessor, DeepSeek V3, which stands at 3.9%. Despite its advanced reasoning capabilities, reasoning does not appear to be the primary cause of this increased hallucination rate; rather, R1 tends to "overhelp" by adding factually correct information not present in the source material. This behavior results in a high number of benign hallucinations, defined as factually accurate but unsupported by the direct text. Validation experiments with human annotators confirmed that R1's outputs were marked as hallucinated more frequently than V3's, with 71.7% of R1's hallucinations being benign. The study further highlights the effectiveness of HHEM (Hierarchical Hybrid Evidence Model) in detecting these benign hallucinations compared to LLM-as-a-judge methods, which often fail to identify them accurately. This suggests that while DeepSeek R1's training may need revision to reduce hallucinations, HHEM proves valuable for hallucination detection, emphasizing its utility over other LLM-based methods.