DeepSeek-R1 hallucinates more than DeepSeek-V3
Blog post from Vectara
Deepseek.AI's newly released reasoning model, Deepseek-R1, has sparked widespread attention due to its impressive reasoning capabilities and cost-effectiveness compared to OpenAI's O1 model, despite debates around its $5.5 million development cost. Open-sourced under an MIT license, Deepseek-R1, however, exhibits a significantly higher hallucination rate of 14.3% compared to its predecessor, Deepseek-V3, as demonstrated through evaluations using Vectara’s HHEM and Google’s FACTS methodologies. The analysis reveals that while Deepseek-R1 maintains consistency in most samples, it produces more borderline hallucinations, leading to a higher variability in scores. Comparisons with the GPT series suggest that reasoning-enhanced models might have inherent trade-offs with hallucination rates, although the GPT series appears to balance reasoning and faithfulness better than the Deepseek models. The findings highlight the importance of careful training to mitigate hallucination risks and underscore the ongoing need for advancements in reasoning model development.