Home / Companies / Semgrep / Blog / Post Details
Content Deep Dive

Finding vulnerabilities in modern web apps using Claude Code and OpenAI Codex

Blog post from Semgrep

Post Details
Company
Date Published
Author
Romain Gaucher, Vasilii Ermilov, Clint Gibler
Word Count
3,610
Language
English
Hacker News Points
-
Summary

The evaluation of AI coding agents, specifically Anthropic's Claude Code and OpenAI Codex, revealed their potential to identify vulnerabilities in real-world Python web applications, albeit with significant limitations. The research, conducted on 11 large open-source projects, showed that Claude Code identified 46 vulnerabilities with a true positive rate (TPR) of 14%, while Codex found 21 vulnerabilities with an 18% TPR, highlighting a high false positive rate in both. The agents demonstrated proficiency in detecting specific vulnerabilities like Insecure Direct Object References (IDOR) but struggled with more complex issues such as SQL Injection and Cross-Site Scripting (XSS) due to challenges in tracing data flows across multiple files and functions. The non-determinism of AI agents, leading to inconsistent results across repeated analyses, poses a significant challenge in ensuring comprehensive vulnerability detection. Despite these challenges, the research underscores the potential of AI tools to complement traditional security practices by providing contextual insights and suggests that a combination of AI-driven analysis and traditional static analysis could enhance security tooling effectiveness.