Qodo scores 71.2% on SWE-bench Verified

Post Details

Company

Qodo

Date Published

Aug. 11, 2025

Author

Tomer Yanay

Word Count

1,146

Language

English

Hacker News Points

-

Source URL

www.qodo.ai/blog/qodo-command-swe-bench-verified

Summary

Qodo Command, a CLI agent developed by Qodo, achieved a 71.2% score on the SWE-bench Verified benchmark, which evaluates AI agents on real-world software engineering tasks. This accomplishment underscores Qodo's commitment to creating AI agents suitable for production environments, capable of handling tasks like code reviews, test writing, bug fixing, and feature generation with context-awareness and integrity. The benchmark involves complex scenarios based on real GitHub issues, where agents must reason and edit code without shortcuts. Qodo Command is powered by Claude 4, thanks to a partnership with Anthropic, and excels due to its architectural focus on context summarization and execution planning. It employs LangGraph for modular agentic workflows and includes tools for file system interaction, shell execution, and code analysis. The platform also offers automation for code integrity tasks and includes a code review UI mode called Qodo Merge for maintaining high-quality standards, positioning itself as a tool built for real-world production rather than just benchmarks.