Thinking vs Thinking: Benchmarking Claude Haiku 4.5 and Sonnet 4.5 on 400 Real PRs
Blog post from Qodo
Smaller AI models are increasingly capable of performing complex reasoning tasks, as demonstrated in a study comparing the Claude Haiku 4.5 and Claude Sonnet 4.5 models using the Qodo PR Benchmark on 400 real GitHub pull requests. The benchmarks showed that Claude Haiku 4.5 consistently outperformed its counterparts, winning a higher percentage of comparisons and achieving superior code suggestion scores in both standard and thinking modes, despite having a smaller size and faster processing speed. The results suggest that upgrading from Sonnet 4 to Haiku 4.5 offers significant performance improvements at a reduced cost, with Haiku 4.5 fitting seamlessly into existing review pipelines. This trend highlights the potential for smaller, more efficient AI models to deliver enhanced reasoning quality in code review tasks, offering practical benefits for engineering teams and valuable insights for researchers.