Comfy Internals | How we got four rival AI labs to fight over our code reviews
Blog post from Comfy
At Comfy, a system was developed to enhance code review processes by leveraging AI models from four different labs, each offering unique perspectives and reducing blind spots that a single model might miss. This system fans out a pull request (PR) diff to four AI models from OpenAI, Anthropic, Google, and Moonshot, conducting two review passes per model, and consolidates the results through a single judge model. This approach aims to catch nuanced bugs such as concurrency issues and API contract drifts, which might be overlooked by human reviewers due to fatigue or by models sharing the same training priors. Operating as a $200/month GitHub Action, this system runs in continuous integration (CI) and is designed to avoid being manipulated by malicious PRs. It has successfully identified significant bugs in about 110 PRs so far, and the architecture is open-sourced to encourage further development and feedback from the engineering community.