Enabling Agent 3 to Self-Test at Scale with REPL-Based Verification

Post Details

Company

Replit

Date Published

Dec. 15, 2025

Author

Peter Zhong and Jacky Zhao and Ryan Carelli

Word Count

2,537

Language

-

Hacker News Points

-

Source URL

replit.com/blog/automated-self-testing

Summary

Replit developed a novel REPL-based verification system to tackle the issue of "Potemkin interfaces," which are deceptive features that appear functional but lack true functionality. This challenge was particularly evident in their project, Agent 3, which required robust self-verification mechanisms to ensure autonomy and reliability. To address this, they employed a hybrid testing approach that integrates traditional browser automation frameworks like Playwright with the flexibility of code execution, allowing agents to perform complex, real-time testing efficiently. This method enhances the agent's ability to verify the functionality of user interfaces and backend interactions, preventing the compounding of errors. By utilizing a subagent for testing, Replit ensures the main agent remains focused and efficient, resulting in an increase in autonomous runtime from 20 to over 200 minutes. This approach not only improves the functional integrity of applications but also reduces testing costs, making it a significant advancement in the development of autonomous software agents.