OpenAI 4o vs Anthropic Sonnet 3.5: AI Models Compared on Bug Detection

Company

Greptile

Date Published

May 1, 2025

Author

Everett Butler

Word count

686

Language

English

Hacker News points

None

URL

www.greptile.com/blog/4o-vs-sonnet-3.5

Summary

The text compares two AI models, OpenAI 4o and Anthropic Sonnet 3.5, in their ability to detect hard-to-find software bugs across multiple programming languages, including Python, Go, TypeScript, Rust, and Ruby. The evaluation dataset consists of 210 programs with realistic but difficult-to-catch bugs introduced by the author. Anthropic Sonnet 3.5 outperformed OpenAI 4o, successfully identifying 26 bugs compared to 20 identified by OpenAI 4o. The performance varies across languages, with Sonnet 3.5 excelling in Ruby and Go due to its reasoning capabilities, while OpenAI 4o performed better in Python due to the extensive training data and familiar patterns. The analysis highlights the complementary strengths of pattern-based and reasoning-based AI models and suggests that future improvements in AI-driven bug detection tools should combine both approaches.