How to automate annotation with GPT-4 Vision’s rival, LLaVA

Company

Encord

Date Published

Dec. 11, 2023

Author

Frederik Hvilshøj

Word count

776

Language

English

Hacker News points

None

URL

encord.com/blog/automate-labeling-with-llava-integration

Summary

In recent months, Encord has integrated the Segment Anything Model (SAM) and introduced the Large Language and Vision Assistant (LLaVA) into their annotation platform, Encord Annotate, marking significant advancements in automated labeling technology. LLaVA, a pioneering multimodal model, excels in image understanding and following complex instructions, even with a smaller training dataset, and shows comparable performance to GPT-4 in interpreting images, though it struggles with Optical Character Recognition (OCR). This open-source model enhances chat capabilities and Science Question Answering (QA), and its integration allows Encord to label images using natural language, offering faster and more accurate annotations than traditional methods. The incorporation of LLaVA aligns with Encord's commitment to data privacy, ensuring data remains within their infrastructure, and the platform's new capabilities promise to accelerate annotation processes across various domains, with ongoing improvements to further enhance understanding of complex ontologies.