✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
Blog post from HuggingFace
ScreenSpot-Pro is an innovative benchmark designed to assess GUI grounding models within high-resolution, professional environments, spanning 23 applications across 5 industries and 3 operating systems. The benchmark addresses the complexity of interacting with intricate software interfaces on high-resolution screens, where existing models demonstrate low accuracy, with the best achieving only 18.9%. ScreenSpot-Pro emphasizes the need for specialized models and techniques, as shown by the limited success of current Multi-modal Large Language Models (MLLMs) in professional settings. Despite improvements from strategies like ReGround methods, which boost accuracy to 40.2%, significant challenges remain in accurately detecting and interacting with small UI elements in professional software. The initiative aims to inspire the development of more adept models and foster community collaboration to advance the usability and performance of GUI agents in demanding professional applications.