Is Sonnet 4.5 the best coding model in the world?

Post Details

Company

Surge AI

Date Published

Oct. 8, 2025

Author

Logan Ritchie

Word Count

3,102

Language

English

Hacker News Points

-

Source URL

surgehq.ai/blog/sonnet-4-5-coding-model-evaluation

Summary

The analysis compares Claude Sonnet 4.5 and GPT-5-Codex, two advanced AI models, focusing on their performance in coding tasks. The study highlights that while Claude Sonnet 4.5 is more expensive, it excels in structured reasoning and context integration, whereas GPT-5-Codex, although cheaper, is noted for its aggressive exploration and recovery behaviors. The benchmark dataset, consisting of 2,161 tasks across nine languages, was meticulously designed to test these models' capabilities in real-world coding scenarios. A specific case study on refactoring a matrix tool illustrates the models' strengths and weaknesses: Claude Sonnet 4.5 passed the task despite struggling with header alignment, while GPT-5-Codex failed due to misinterpretation and premature termination. The findings underscore the importance of understanding each model's unique reasoning style, suggesting that their differences in thinking, rather than skill level, are crucial to their performance. The study concludes that while both models encounter difficulties, their ability to maintain focus is pivotal, and Claude Sonnet 4.5 currently sets the standard in coding AI by demonstrating robust reasoning akin to a human engineer.