Introducing a new evaluation for creative ability in Large Language Models

Post Details

Company

Hume

Date Published

Feb. 9, 2024

Author

Hume AI Team

Word Count

1,043

Language

English

Hacker News Points

-

Source URL

www.hume.ai/blog/new-evaluation-creative-ability-in-large-language-models

Summary

HumE-1 (Human Evaluation 1) is a new evaluation method for large language models (LLMs) that focuses on human ratings to assess their ability to perform creative tasks in ways that matter to us, evoking the desired feelings. LLMs are already being used in various fields such as writing books and articles, assisting legal professionals and healthcare practitioners, and providing mental health support. However, existing benchmarks fail to capture how these models affect our satisfaction and well-being. HumE-1 evaluates LLMs on tasks like writing motivational quotes, interesting facts, funny jokes, beautiful haikus, charming limericks, scary horror stories, appetizing descriptions of food, and persuasive arguments for charity donations. The evaluation uses honest and naturalistic prompts to reflect real-life scenarios better. In the first round of results, Gemini Ultra performed best, followed by GPT-4 Turbo, with both models having significant room for improvement.