Home / Companies / Semgrep / Blog / Post Details
Content Deep Dive

Can LLMs Detect IDORs? Understanding the Boundaries of AI Reasoning

Blog post from Semgrep

Post Details
Company
Date Published
Author
Vasilii Ermilov
Word Count
3,426
Language
English
Hacker News Points
-
Summary

The research explored the effectiveness of AI models, specifically Claude Code and OpenAI Codex, in identifying Insecure Direct Object Reference (IDOR) vulnerabilities in open-source applications. The study found that while these models were able to detect 15 real vulnerabilities, they also generated 93 false positives. The models performed best in simpler scenarios where authorization logic was either absent or contained within a single function or file, but struggled with complex cases involving cross-file or middleware-based logic. The research highlighted the models' potential to identify localized logical flaws that traditional tools might miss but also noted their limitations, such as high false positive rates and non-deterministic outputs. The findings suggest that while AI models can be valuable in vulnerability detection, human oversight remains crucial to validate and interpret the results effectively. The study emphasizes the importance of enhancing prompt engineering and incorporating additional scaffolding to improve the models' accuracy and reliability in detecting IDOR vulnerabilities.