Hemingway-bench Leaderboard: Because Good Writing Isn't a Checklist of Vibes
Blog post from Surge AI
Hemingway-bench is a new AI writing leaderboard designed to enhance the evaluation of AI-generated writing by emphasizing genuine creativity, nuance, and depth over superficial metrics. Unlike traditional benchmarks such as EQ-Bench Creative Writing and LMArena, which often reward models for meeting basic structural criteria or favoring clickbait-style content, Hemingway-bench uses expert human evaluators to assess writing across a spectrum of real-world and frontier tasks. These tasks range from creative storytelling to business document writing, with models judged on dimensions such as creativity, coherence, and writing quality. The results highlighted Google's Gemini and Claude's Opus as top performers, with their strengths in creating engaging narratives and maintaining a human-like voice. The initiative aims to shift the focus from rewarding flashy, superficial prose to recognizing writing that offers depth, emotional resonance, and insightful commentary, thus redefining what constitutes quality in AI-generated content.