The article evaluates the performance of two Large Language Models (LLMs) from OpenAI, 4o and its reasoning-focused counterpart, 4o-mini, in detecting subtle, complex bugs across multiple programming languages. The authors introduce a dataset of 210 intentionally difficult-to-catch bugs across Python, TypeScript, Go, Rust, and Ruby, and test the models on this dataset. While both models perform reasonably well, 4o-mini shows a slight advantage in identifying challenging bugs, especially in dynamically-typed languages like Ruby, where its reasoning capabilities prove valuable. The results highlight the importance of logical reasoning in AI-powered bug detection, particularly for less mainstream languages or environments with limited training data. Overall, the study underscores the growing significance of AI-driven reasoning models in software verification and suggests that improvements in these tools will be crucial for delivering safer, more reliable software.