First Impressions with Gemini Advanced
Blog post from Roboflow
Introduced in December 2023, Gemini is a series of multimodal models developed by Google and DeepMind, with its latest version, Gemini Advanced, released in February 2024. The Roboflow team conducted qualitative tests to assess its capabilities compared to other multimodal models and the initial Gemini release. While Gemini Advanced showed improvement in certain real-world OCR tasks, such as reading a serial number on a tire, it demonstrated regressions in fundamental tasks like Visual Question Answering (VQA) and Document OCR. The team observed that Gemini Advanced could not process multiple images simultaneously, a feature available in other models like GPT-4 with Vision and Qwen-VL-Plus. Despite these limitations, the model successfully identified anomalies in images and showed promise in some areas. However, the inconsistencies in performance suggest the need for further testing and analysis. The Roboflow team plans to continue exploring Gemini Advanced's capabilities and encourages community contributions to better understand the model's potential.