Home / Companies / Greptile / Blog / Post Details
Content Deep Dive

OpenAI o3-mini vs Anthropic Sonnet 3.7 Thinking for Bug Detection

Blog post from Greptile

Post Details
Company
Date Published
Author
Everett Butler
Word Count
682
Language
English
Hacker News Points
-
Summary

OpenAI's o3-mini outperformed Anthropic's Sonnet 3.7 Thinking in a benchmark of bug detection, catching more bugs across multiple programming languages, particularly in Python and Rust. Despite being designed as a "thinking" model with an added planning step, Sonnet 3.7 Thinking did not outperform o3-mini overall, with strengths shown in lower-resource languages like Ruby and Go where logic deduction plays a bigger role. The results suggest that while reasoning models have value in certain scenarios, they still need to demonstrate stronger consistency across languages to match the performance of non-reasoning models like o3-mini.