Home / Companies / Roboflow / Blog / Post Details
Content Deep Dive

OpenAI o3 and o4-mini: Multimodal and Vision Analysis

Blog post from Roboflow

Post Details
Company
Date Published
Author
James Gallagher
Word Count
1,301
Language
English
Hacker News Points
-
Summary

OpenAI's newly released multimodal models, o3 and o4-mini, are designed as part of the "reasoning" series, enabling integration of images into their analytical processes. Both models were evaluated using various tasks, including object counting, visual question answering, and real-world OCR, with o4-mini passing four out of seven tests, while o3 passed three. Despite their reasoning capabilities, both models underperformed in comparison to OpenAI's other models like GPT-4.1, particularly in tasks like object counting and object detection, where they exhibited variability and errors. Available via the OpenAI API and Playground, these models utilize a "chain of thought" mechanism to provide reasoned answers, which is beneficial for complex analytical tasks.