Can LLMs Detect IDORs? Understanding the Boundaries of AI Reasoning
Blog post from Semgrep
The research explored the effectiveness of AI models, specifically Claude Code and OpenAI Codex, in identifying Insecure Direct Object Reference (IDOR) vulnerabilities in open-source applications. The study found that while these models were able to detect 15 real vulnerabilities, they also generated 93 false positives. The models performed best in simpler scenarios where authorization logic was either absent or contained within a single function or file, but struggled with complex cases involving cross-file or middleware-based logic. The research highlighted the models' potential to identify localized logical flaws that traditional tools might miss but also noted their limitations, such as high false positive rates and non-deterministic outputs. The findings suggest that while AI models can be valuable in vulnerability detection, human oversight remains crucial to validate and interpret the results effectively. The study emphasizes the importance of enhancing prompt engineering and incorporating additional scaffolding to improve the models' accuracy and reliability in detecting IDOR vulnerabilities.