OpenAI o3 and o4-mini: Multimodal and Vision Analysis
Blog post from Roboflow
OpenAI's newly released multimodal models, o3 and o4-mini, are designed as part of the "reasoning" series, enabling integration of images into their analytical processes. Both models were evaluated using various tasks, including object counting, visual question answering, and real-world OCR, with o4-mini passing four out of seven tests, while o3 passed three. Despite their reasoning capabilities, both models underperformed in comparison to OpenAI's other models like GPT-4.1, particularly in tasks like object counting and object detection, where they exhibited variability and errors. Available via the OpenAI API and Playground, these models utilize a "chain of thought" mechanism to provide reasoned answers, which is beneficial for complex analytical tasks.