HHEM | Flash Update: Anthropic Claude 3
Blog post from Vectara
On March 4, 2024, Anthropic launched Claude 3, a suite of advanced AI models that includes Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, each distinguished by attributes such as speed, diligence, and power respectively. Notably, Claude 3 Opus has demonstrated performance on par with or exceeding OpenAI's GPT-4, particularly in benchmarks assessing factual consistency using the Hughes Hallucination Evaluation Model (HHEM). While Claude 3 Opus is noted for its power, it ranks slightly below Claude 3 Sonnet in factual consistency on a limited evaluation set, which should not be seen as a definitive ranking. The models also outperform Google's Gemma model in terms of factual consistency. Despite claims of surpassing GPT-4, caution is advised regarding these assertions, especially since Claude 3 models, unlike many recent open-source releases, are not open-sourced and are accessible only via the Anthropic API.