Vision models - Plushcap

Company

Ollama

Date Published

Feb. 2, 2024

Author

Word count

418

Language

Hacker News points

None

URL

ollama.com/blog/vision-models

Summary

LLaVA 1.6 introduces an updated collection of Large Language-and-Vision Assistant models, now supporting higher image resolution with four times more pixels and enhanced text recognition and reasoning abilities, thanks to additional training on document, chart, and diagram datasets. The models, available in parameter sizes of 7B, 13B, and a new 34B, are distributed under more permissive licenses such as the Apache 2.0 license and the LLaMA 2 Community License. These models can be utilized through the Ollama CLI, Python, and JavaScript libraries, and the REST API, facilitating tasks like image description, object detection, and text recognition. The models' training details and benchmark results, comparing them to other leading models, are accessible on the LLaVA website.