Home / Companies / Sonar / Blog / Post Details
Content Deep Dive

New data on code quality: GPT-5.2 high, Opus 4.5, Gemini 3, and more

Blog post from Sonar

Post Details
Company
Date Published
Author
Prasenjit Sarkar
Word Count
1,085
Language
English
Hacker News Points
-
Summary

The Sonar LLM Leaderboard provides a comprehensive evaluation of AI coding models by analyzing over 4,000 Java programming assignments with the SonarQube static analysis engine, focusing on functional performance, structural quality, security, and maintainability. The analysis revealed that while models like Opus 4.5 Thinking and Gemini 3 Pro achieved high pass rates, they differed significantly in verbosity and complexity, affecting their maintainability and ease of use. GPT-5.2 High, although leading in security with the lowest blocker vulnerabilities per million lines of code, struggled with high code volume and concurrency issues. Conversely, Claude Sonnet 4.5 exhibited the highest rate of critical security vulnerabilities and resource management leaks. The research highlights the trade-offs between performance and complexity, with models like Gemini 3 Pro balancing high pass rates with low verbosity and cognitive complexity, albeit with a higher issue density. The leaderboard aims to inform engineering leaders by providing transparency in how AI models handle essential software engineering fundamentals, ultimately affecting the total cost of ownership due to factors like code smells and design best practice violations.