AI benchmarking report: Measuring the exploitation ladder for AI models
Blog post from Bugcrowd
ExploitBench, introduced by security researchers Seunghyun Lee and Bugcrowd, is the first benchmark designed to evaluate how effectively AI models can progress from identifying vulnerabilities to achieving full exploitation control. It addresses the need for a nuanced understanding of AI capabilities in cybersecurity, beyond the simple pass/fail assessments of existing benchmarks. Focusing on the V8 JavaScript/WebAssembly interpreter, ExploitBench categorizes AI model performance into capability tiers, ranging from crash discovery to full code execution. The benchmark reveals that private AI models like Mythos have surpassed human experts in certain vulnerability exploits, while public models, such as GPT-5.5, have achieved significant strides, including bypassing sandbox defenses and executing code in some cases. Bugcrowd plays a key role by developing reinforcement learning environments to enhance AI models' security skills, providing a comprehensive curriculum that spans detection, exploitation, hijacking, patching, and auditing vulnerabilities, all aimed at advancing AI's potential in cybersecurity.