The Coding Personalities of Leading LLMs—GPT-5 Update
Blog post from Sonar
In the updated analysis of leading language models, GPT-5 was compared to Anthropic's Claude Sonnet 4 and other models, revealing that while GPT-5 demonstrates competitive functional performance, it does not surpass Claude Sonnet 4 in overall efficiency. GPT-5's code generation is marked by increased verbosity, complexity, and a high density of code smells, despite having a strong focus on security with the lowest vulnerability density among the models tested. However, GPT-5 frequently reintroduces classic security flaws and generates complex code, leading to long-term maintainability issues and technical debt. Furthermore, GPT-5 exhibits a higher rate of logical errors, particularly in control-flow mistakes, which complicates its reliability. The analysis suggests that organizations should employ stringent governance strategies, including static analysis and thorough code reviews, to manage GPT-5's complexity and security concerns effectively, underscoring the importance of a "trust and verify" approach in leveraging its capabilities.