Scaling long-running autonomous coding
Blog post from Cursor
Researchers experimenting with autonomous coding agents have explored the potential of running multiple agents concurrently to tackle complex software projects, traditionally requiring months of human effort. Initially, agents were allowed to self-coordinate through a shared file system, but this approach faced challenges such as bottlenecks and risk aversion. A refined method involved creating a hierarchical system with distinct roles, where planners generate and assign tasks, and workers focus on execution without bigger-picture concerns. This system enabled the successful development of ambitious projects, such as a web browser and extensive code migrations, demonstrating the scalability of autonomous agents. The choice of models is crucial, with GPT-5.2 outperforming others for tasks requiring sustained focus and precision. Simple systems proved more effective than complex ones, with a significant emphasis on prompts to guide agent behavior. While the current system shows promise, multi-agent coordination remains challenging, suggesting further refinement is needed to optimize these processes for AI-assisted software development.