Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Ziyang Luo and Kaixin Li
Word Count
576
Language
-
Hacker News Points
-
Summary

ScreenSpot-Pro is an innovative benchmark designed to assess GUI grounding models within high-resolution, professional environments, spanning 23 applications across 5 industries and 3 operating systems. The benchmark addresses the complexity of interacting with intricate software interfaces on high-resolution screens, where existing models demonstrate low accuracy, with the best achieving only 18.9%. ScreenSpot-Pro emphasizes the need for specialized models and techniques, as shown by the limited success of current Multi-modal Large Language Models (MLLMs) in professional settings. Despite improvements from strategies like ReGround methods, which boost accuracy to 40.2%, significant challenges remain in accurately detecting and interacting with small UI elements in professional software. The initiative aims to inspire the development of more adept models and foster community collaboration to advance the usability and performance of GUI agents in demanding professional applications.