OpenAI's o3-mini and Anthropic's Sonnet 3.7, two compact AI code review tools, were compared on a benchmark of hard-to-catch bugs across multiple programming languages. The evaluation dataset consisted of 210 programs with realistic but difficult-to-catch bugs in various domains and languages. While both models performed competitively, o3-mini slightly outperformed Sonnet 3.7 overall, but the latter showed stronger reasoning in edge cases, especially concerning concurrency and async behavior, particularly in TypeScript and Go. The study highlights that there is no universal winner, but rather strengths shifting based on language, bug type, and model architecture.