Home / Companies / Roboflow / Blog / Post Details
Content Deep Dive

Launch: Fine-Tune Florence-2 for VQA with Roboflow

Blog post from Roboflow

Post Details
Company
Date Published
Author
James Gallagher
Word Count
1,046
Language
English
Hacker News Points
-
Summary

James Gallagher's article discusses the process of fine-tuning Microsoft's multimodal computer vision model, Florence-2, for visual question answering (VQA) using the Roboflow platform. The guide walks users through creating an image-text pairs dataset on Roboflow, setting up prefixes for labeling, and processing data for training. It explains how to utilize Roboflow Train to fine-tune the Florence-2 model in the cloud and later deploy the model on personal hardware through Roboflow Inference. Additionally, the article provides a step-by-step approach for training models, including the creation of dataset versions and labeling data with JSON payloads to enhance model performance in tasks like object detection and OCR. Once fine-tuned, the model can be deployed and run locally with Inference, showcasing its capability to identify specific information such as subtotals and totals in receipt data.