OpenAI o3-mini vs Anthropic Sonnet 3.7 for Bug Detection

Company

Greptile

Date Published

April 28, 2025

Author

Daksh Gupta

Word count

725

Language

English

Hacker News points

None

URL

www.greptile.com/blog/o3-mini-vs-sonnet-3.7

Summary

OpenAI's o3-mini and Anthropic's Sonnet 3.7, two compact AI code review tools, were compared on a benchmark of hard-to-catch bugs across multiple programming languages. The evaluation dataset consisted of 210 programs with realistic but difficult-to-catch bugs in various domains and languages. While both models performed competitively, o3-mini slightly outperformed Sonnet 3.7 overall, but the latter showed stronger reasoning in edge cases, especially concerning concurrency and async behavior, particularly in TypeScript and Go. The study highlights that there is no universal winner, but rather strengths shifting based on language, bug type, and model architecture.