Home / Companies / Roboflow / Blog / Post Details
Content Deep Dive

Cohere Aya Vision: Multimodal and Vision Analysis

Blog post from Roboflow

Post Details
Company
Date Published
Author
James Gallagher
Word Count
1,092
Language
English
Hacker News Points
-
Summary

Cohere Aya Vision, released on March 3, 2025, is a multimodal model developed by Cohere, designed for non-commercial use under a Creative Commons Attribution Non Commercial 4.0 license. Available in two sizes, 8b and 35b, the model can be accessed via Hugging Face, Kaggle, Cohere Playground, and WhatsApp. It supports 23 languages and excels in multilingual multimodal tasks, outperforming several existing models. Aya Vision is evaluated for various tasks like object counting, visual question answering, document OCR, and real-world OCR. While it successfully identified objects and answered various questions, it demonstrated limitations in document OCR and occasionally provided incorrect or incomplete information. Alongside its release, Cohere introduced AyaVisionBench, a benchmark dataset spanning 23 languages and 9 task categories, to evaluate the model's capabilities in tasks like image captioning and chart understanding.