When AI “Assures” Without Evidence: Lessons from Deloitte’s $290K Hallucination
Blog post from Vectara
Deloitte Australia's use of generative AI, reportedly GPT-4o via Azure OpenAI, in drafting a 237-page report for the Australian government backfired when it was discovered to contain fabricated citations, misquoted legal judgments, and invented academic references, leading to a refund of A$290,000. This incident highlights the issue of "citation hallucination," where AI creates the illusion of credibility by inventing sources, posing significant risks in fields requiring traceable source verification. The failure was not solely due to the AI's inaccuracies but also due to the lack of organizational measures for validating citation quality before publication. Retrieval-Augmented Generation (RAG) systems, designed to mitigate such risks by ensuring responses are based on verifiable sources, still face challenges like irrelevant document retrieval due to reliance on cosine similarity of embeddings. To address these issues, frameworks like Open RAG Eval offer methods to evaluate the factual grounding of AI-generated content, emphasizing the need to enhance verification processes to ensure AI outputs are reliable and enterprise-ready.