Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice?

Post Details

Company

Snyk

Date Published

June 29, 2026

Author

Liran Tal

Word Count

3,474

Company Posts That Month

18

Language

English

Hacker News Points

-

Source URL

snyk.io/blog/snyk-vulnbench-js-1-0-llm-security-review-repeatability

Summary

A study of 300 vulnerability-finding scans using Snyk VulnBench JS 1.0 reveals varied repeatability in security reviews conducted by agentic language models (LLMs) on JavaScript code. While LLMs can identify familiar exploit shapes that sometimes align with Snyk Code findings, their reports on vulnerabilities outside the reference set were inconsistent and less repeatable. Across repeated runs, reference-matched findings were more stable, appearing consistently, whereas unique unmatched findings often varied, with nearly half appearing in only one of five runs. The benchmark suggests a complementary approach, combining LLMs and SAST tools like Snyk Code, as they uncover different security gaps and failure modes, with deterministic SAST offering more reliable coverage of systematic vulnerabilities. Cost and performance analysis indicated that more expensive LLM configurations did not necessarily correlate with better vulnerability detection, emphasizing the need for efficient, combined security workflows to enhance reliability. Future benchmarks aim to incorporate broader application structures and independent ground truth sources to further refine the understanding of LLM and SAST capabilities in security reviews.

Trends Found in this Post

No tracked trend matches for this post yet.