DeepSeek R1 vs OpenAI o1: Which AI Model Catches More Software Bugs?

Company

Greptile

Date Published

April 14, 2025

Author

Everett Butler

Word count

711

Language

English

Hacker News points

None

URL

www.greptile.com/blog/o1-vs-Deepseek-R1

Summary

The study compares two large language models, OpenAI o1 and DeepSeek R1, in their ability to detect subtle bugs in production code. A dataset of 210 small programs was created across sixteen domains, each containing a realistic bug. The models were prompted with the same buggy code, and asked to identify the issue. While both models struggled with the most subtle bugs, DeepSeek R1 consistently outperformed o1 across most languages. Notably, DeepSeek R1 excelled in Rust and TypeScript, catching more bugs than o1 in these languages. The study suggests that DeepSeek R1's stronger performance may be due to its training data, architectural differences, or error heuristics. The results highlight the potential of large language models in automated bug detection and verification, particularly for languages like Rust and TypeScript.