cua-bench: A Framework for Benchmarking, Training Data, and RL Environments for Computer-Use Agents

Post Details

Company

HuggingFace

Date Published

Dec. 16, 2025

Author

Francesco Bonacci and Dillon DuPont

Word Count

1,086

Company Posts That Month

48

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/cua-ai/cua-bench

Summary

Cua-Bench is a versatile and scalable framework designed to address the inconsistencies in computer-use agents when interacting with varying desktop environments, which can result in up to 10x performance variance due to minor UI changes. Unlike existing benchmarks that rely on static VM snapshots and fixed configurations, Cua-Bench generates diverse training data, verified trajectories, and RL environments that are customizable across multiple dimensions, such as different platforms, devices, graphic styles, and resolutions. This framework provides a Playwright-like Python API for defining oracle solutions, enabling the creation of robust cross-platform training data through multi-step task trajectories, which can be re-rendered across different OS themes. Cua-Bench also includes simulators for RL training, offering adapters for existing benchmarks and simulated shell applications, such as clones of popular applications like Spotify and Slack, to facilitate realistic agent interactions without the need for virtual machines.

Trends Found in this Post

No tracked trend matches for this post yet.