DeepSeek-R1 hallucinates more than DeepSeek-V3

Post Details

Company

Vectara

Date Published

Jan. 30, 2025

Author

Forrest Bao, Chenyu Xu and Ofer Mendelevitch

Word Count

960

Language

English

Hacker News Points

-

Source URL

www.vectara.com/blog/deepseek-r1-hallucinates-more-than-deepseek-v3

Summary

Deepseek.AI's newly released reasoning model, Deepseek-R1, has sparked widespread attention due to its impressive reasoning capabilities and cost-effectiveness compared to OpenAI's O1 model, despite debates around its $5.5 million development cost. Open-sourced under an MIT license, Deepseek-R1, however, exhibits a significantly higher hallucination rate of 14.3% compared to its predecessor, Deepseek-V3, as demonstrated through evaluations using Vectara’s HHEM and Google’s FACTS methodologies. The analysis reveals that while Deepseek-R1 maintains consistency in most samples, it produces more borderline hallucinations, leading to a higher variability in scores. Comparisons with the GPT series suggest that reasoning-enhanced models might have inherent trade-offs with hallucination rates, although the GPT series appears to balance reasoning and faithfulness better than the Deepseek models. The findings highlight the importance of careful training to mitigate hallucination risks and underscore the ongoing need for advancements in reasoning model development.