Legacy-Bench: Can AI Agents Maintain the World's Most Critical Software?

Post Details

Company

Factory

Date Published

April 1, 2026

Author

Leo Tchourakov, Abhay Singhal, Eno Reyes

Word Count

1,882

Company Posts That Month

7

Language

English

Hacker News Points

-

Post removed?

No

Source URL

factory.ai/news/legacy-bench

Summary

Legacy-Bench is a new benchmark designed to evaluate AI agents' capabilities in handling legacy software engineering tasks across six language families, including COBOL, Fortran, and Java 7, which are foundational yet increasingly challenging due to retiring engineers and complex business rules embedded within the code. It provides a comprehensive set of tasks that involve fixing bugs, implementing new functionalities, and migrating code, reflecting real-world applications in critical infrastructure. The benchmark reveals significant performance variations among AI models, with agents excelling in bug fixing due to visible errors in languages like Java 7 but struggling with COBOL due to its silent errors and complex format precision requirements. The results indicate a steep learning curve in reading and writing new legacy code, with migration success heavily dependent on the target language. No single model consistently outperforms across all tasks, highlighting diverse strengths and weaknesses, and illustrating the need for systematic verification and iteration in legacy environments. As AI models improve their legacy language training and self-verification capabilities, the performance gap between legacy and modern benchmarks is expected to narrow, offering insights for those modernizing legacy systems.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
AI Agents	3	4,430	1,100	236	-3%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.