We have Mythos at Home: GLM 5.2 beats Claude in our Cyber Benchmarks

Post Details

Company

Semgrep

Date Published

June 22, 2026

Author

Katie Paxton-Fear, Seth Jaksik, Brenden Noblitt, Erik Buchanan

Word Count

2,117

Company Posts That Month

10

Language

English

Hacker News Points

-

Source URL

semgrep.dev/blog/2026/we-have-mythos-at-home-glm-52-beats-claude-in-our-cyber-benchmarks

Summary

An experiment conducted by Semgrep evaluated various open-source models against their IDOR benchmark to assess vulnerability-detection performance, revealing unexpected results. Among the models tested, Zhipu AI's GLM 5.2, an open-weight model, achieved a 39% F1 score, outperforming Claude Code's 32% score at a significantly lower cost of approximately $0.17 per vulnerability detected. However, it still lagged behind Semgrep's multimodal pipeline, which achieved 53–61% F1 scores with a more sophisticated harness. The test primarily aimed to discern how much of the performance was attributable to the model itself versus the harness—a critical question for security tasks leveraging AI. GLM 5.2, notable for its open-weight nature and cost-effectiveness, showed promise despite not having endpoint discovery support like the multimodal pipeline, indicating that open-weight models have become a viable consideration for security research. The experiment underscores the importance of harness configuration and represents a step forward in the competitiveness of open-weight models, although it also highlights that one successful outcome does not imply universal superiority across different tasks or datasets.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	5	5,172	1,006	220	-43%
AI Agents	1	4,874	1,103	240	-1%