Home / Companies / Hume / Blog / Post Details
Content Deep Dive

Introducing a new evaluation for creative ability in Large Language Models

Blog post from Hume

Post Details
Company
Date Published
Author
Hume AI Team
Word Count
1,043
Language
English
Hacker News Points
-
Summary

HumE-1 (Human Evaluation 1) is a new evaluation method for large language models (LLMs) that focuses on human ratings to assess their ability to perform creative tasks in ways that matter to us, evoking the desired feelings. LLMs are already being used in various fields such as writing books and articles, assisting legal professionals and healthcare practitioners, and providing mental health support. However, existing benchmarks fail to capture how these models affect our satisfaction and well-being. HumE-1 evaluates LLMs on tasks like writing motivational quotes, interesting facts, funny jokes, beautiful haikus, charming limericks, scary horror stories, appetizing descriptions of food, and persuasive arguments for charity donations. The evaluation uses honest and naturalistic prompts to reflect real-life scenarios better. In the first round of results, Gemini Ultra performed best, followed by GPT-4 Turbo, with both models having significant room for improvement.