Towards self-driving codebases
Blog post from Cursor
Research on scaling long-running autonomous coding has led to the development of a new agent harness that orchestrates thousands of agents in parallel to build a web browser, marking a significant milestone in autonomous software development. Initially, attempts to use a single agent for complex tasks like building a browser were unsuccessful due to task overwhelm, prompting the transition to a multi-agent system with structured roles for planning, executing, and evaluating tasks. This structure improved coordination but revealed challenges with synchronization and resource management. The final system design incorporates recursive planning, where a root planner delegates tasks to subplanners and workers, enhancing throughput and maintaining dynamism without global synchronization overhead. The research highlighted the importance of precise initial instructions, constraints over instructions, and empirical data-driven adjustments to optimize agent performance. The system achieved notable throughput, peaking at around 1,000 commits per hour, while accepting a manageable error rate to maintain efficiency. This research not only advances the understanding of autonomous coding systems but also reflects emergent behaviors akin to current software team structures, suggesting potential future directions for long-running AI agents.