Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Introducing WM Bench: A Benchmark for Cognitive Intelligence in World Models

Blog post from HuggingFace

Post Details
Company
Date Published
Author
VIDRAFT_LAB
Word Count
1,563
Language
-
Hacker News Points
-
Summary

WM Bench is a benchmark designed to evaluate the cognitive intelligence of world models by assessing whether these models truly understand their environments, not just render them convincingly. Unlike existing benchmarks that focus on visual and motion realism, WM Bench introduces a cognitive dimension, scoring models based on their ability to perform prediction-based reasoning, threat response, emotion escalation, contextual memory utilization, and adaptive recovery. The benchmark consists of three pillars—Perception, Cognition, and Embodiment—covering ten categories through 100 scenarios scored on a 1000-point scale. Prometheus v1.0, a reference world model, serves as a baseline for evaluation, highlighting both the strengths and current limitations in cross-embodiment transfer. WM Bench, part of the FINAL Bench family, aims to spark discussion and improvement within the AI community by openly releasing its scoring rubrics and inviting feedback, despite being an early iteration with potential limitations in its complexity and scoring estimates.