Home / Companies / SuperAGI / Blog / Post Details
Content Deep Dive

SuperCoder 2.0 achieves 34% success rate in SWE-bench Lite, ranking #4 globally & #1 among all

Blog post from SuperAGI

Post Details
Company
Date Published
Author
Akshat Jain
Word Count
1,987
Language
English
Hacker News Points
-
Summary

Recent advancements in multi-agent systems powered by Large Language Models (LLMs) have shown promise in addressing complex tasks, including autonomous software development. A notable effort in this domain is a system leveraging GPT-4o and Sonnet-3.5, which achieved a 34% success rate on the SWE-Bench-Lite benchmark, a dataset designed to evaluate functional bug fixes in real-world software issues. The system's architecture is divided into two main components: Code Search and Code Generation. Code Search involves navigating the codebase to identify relevant sections using a two-tiered approach with Retrieval-Augmented Generation (RAG) and an agent-based system, while Code Generation focuses on creating patches to fix identified bugs. The use of a dockerized setup ensures reproducibility and efficiency in the evaluation process. Despite its success, the system faces challenges in accurately identifying buggy locations and improving localization methods, suggesting areas for further research and development. The study highlights the potential of a structured approach combining RAG-based flow and file schemas to enhance the accuracy and efficiency of autonomous code generation systems, setting the stage for future advancements in the field.