Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

cua-bench: A Framework for Benchmarking, Training Data, and RL Environments for Computer-Use Agents

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Francesco Bonacci and Dillon DuPont
Word Count
1,086
Language
-
Hacker News Points
-
Summary

Cua-Bench is a versatile and scalable framework designed to address the inconsistencies in computer-use agents when interacting with varying desktop environments, which can result in up to 10x performance variance due to minor UI changes. Unlike existing benchmarks that rely on static VM snapshots and fixed configurations, Cua-Bench generates diverse training data, verified trajectories, and RL environments that are customizable across multiple dimensions, such as different platforms, devices, graphic styles, and resolutions. This framework provides a Playwright-like Python API for defining oracle solutions, enabling the creation of robust cross-platform training data through multi-step task trajectories, which can be re-rendered across different OS themes. Cua-Bench also includes simulators for RL training, offering adapters for existing benchmarks and simulated shell applications, such as clones of popular applications like Spotify and Slack, to facilitate realistic agent interactions without the need for virtual machines.