OpenAI GPT-4.1: Multimodal and Vision Analysis
Blog post from Roboflow
Released on April 14th, 2025, GPT-4.1 is OpenAI's latest series of multimodal models designed to perform a variety of tasks, including visual question answering (VQA), optical character recognition (OCR), and receipt reading. Available in three sizes—GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano—the models are intended for use via the OpenAI API and feature a context window of 1 million tokens for handling extensive contextual tasks. The models demonstrate improved performance over previous versions on benchmarks like SWE-bench Verified and Video-MME. In testing, GPT-4.1 successfully completed several tasks, such as counting objects in images and reading text from documents, although it struggled with object detection, a common challenge for multimodal models. GPT-4.1 nano, despite being smaller, outperformed the base model in some tasks, highlighting its efficiency in certain scenarios. Users can experiment with the model in the ChatGPT Playground and explore its capabilities further through tools like the Vision AI Checkup.