Thinking vs Thinking: Benchmarking Claude Haiku 4.5 and Sonnet 4.5 on 400 Real PRs

Post Details

Company

Qodo

Date Published

Oct. 23, 2025

Author

Dr. Ofir Friedman

Word Count

603

Language

English

Hacker News Points

-

Source URL

www.qodo.ai/blog/thinking-vs-thinking-benchmarking-claude-haiku-4-5-and-sonnet-4-5-on-400-real-prs

Summary

Smaller AI models are increasingly capable of performing complex reasoning tasks, as demonstrated in a study comparing the Claude Haiku 4.5 and Claude Sonnet 4.5 models using the Qodo PR Benchmark on 400 real GitHub pull requests. The benchmarks showed that Claude Haiku 4.5 consistently outperformed its counterparts, winning a higher percentage of comparisons and achieving superior code suggestion scores in both standard and thinking modes, despite having a smaller size and faster processing speed. The results suggest that upgrading from Sonnet 4 to Haiku 4.5 offers significant performance improvements at a reduced cost, with Haiku 4.5 fitting seamlessly into existing review pipelines. This trend highlights the potential for smaller, more efficient AI models to deliver enhanced reasoning quality in code review tasks, offering practical benefits for engineering teams and valuable insights for researchers.