The Year of the Agent: What Recent Attacks Revealed in Q4 2025 (and What It Means for 2026)
Blog post from Lakera
In the fourth quarter of 2025, the evolution of agentic AI systems presented new challenges and opportunities for both developers and attackers. As these systems began interacting with documents, tools, and external data, the threat landscape shifted, with attackers quickly adapting to exploit new vulnerabilities. The primary goal for attackers was system prompt extraction, using techniques like hypothetical scenarios and obfuscation to reveal sensitive information. Additionally, content safety bypasses became more subtle, with attackers framing prompts in ways that circumvented direct policy challenges. Exploratory probing emerged as a tactic to understand model vulnerabilities, while agent-specific attacks revealed attempts to access confidential data and embed malicious instructions in external content. Indirect attacks, which required fewer attempts than direct injections, highlighted the growing complexity of AI systems. As AI continues to advance, these trends underscore the need for comprehensive security measures to cover every interaction and anticipate new attack vectors.