The Backbone Breaker Benchmark: Testing the Real Security of AI Agents

Company

Lakera

Date Published

Nov. 14, 2025

Author

Lakera Team

Word count

2245

Language

Hacker News points

None

URL

www.lakera.ai/blog/the-backbone-breaker-benchmark

Summary

The Backbone Breaker Benchmark (b3), developed by Lakera in collaboration with the UK AI Security Institute, is a novel approach to evaluating the security of AI agents by focusing on the vulnerabilities within their core large language models (LLMs). Unlike traditional benchmarks that assess the intelligence or safety of a model as a whole, b3 zooms in on the individual steps where LLMs may fail under targeted attacks, utilizing a method called threat snapshots. These snapshots isolate specific moments when an AI agent might make a vulnerable decision, allowing for a focused and reproducible evaluation of LLM security. The b3 benchmark employs nearly 200,000 human red-team attempts from the Gandalf: Agent Breaker project to create a comprehensive dataset for testing models against real-world adversarial scenarios. Findings reveal that models with explicit reasoning processes tend to be more secure and that open-weight models are rapidly closing the security gap with their closed-weight counterparts. The benchmark aims to transform AI security into a measurable and comparable science, offering valuable insights for developers, model providers, researchers, and policymakers, with the ultimate goal of establishing a new standard for evaluating AI agent security.