Home / Companies / Surge AI / Blog / Post Details
Content Deep Dive

Is Sonnet 4.5 the best coding model in the world?

Blog post from Surge AI

Post Details
Company
Date Published
Author
Logan Ritchie
Word Count
3,102
Language
English
Hacker News Points
-
Summary

The analysis compares Claude Sonnet 4.5 and GPT-5-Codex, two advanced AI models, focusing on their performance in coding tasks. The study highlights that while Claude Sonnet 4.5 is more expensive, it excels in structured reasoning and context integration, whereas GPT-5-Codex, although cheaper, is noted for its aggressive exploration and recovery behaviors. The benchmark dataset, consisting of 2,161 tasks across nine languages, was meticulously designed to test these models' capabilities in real-world coding scenarios. A specific case study on refactoring a matrix tool illustrates the models' strengths and weaknesses: Claude Sonnet 4.5 passed the task despite struggling with header alignment, while GPT-5-Codex failed due to misinterpretation and premature termination. The findings underscore the importance of understanding each model's unique reasoning style, suggesting that their differences in thinking, rather than skill level, are crucial to their performance. The study concludes that while both models encounter difficulties, their ability to maintain focus is pivotal, and Claude Sonnet 4.5 currently sets the standard in coding AI by demonstrating robust reasoning akin to a human engineer.