Launch: Fine-Tune Florence-2 for VQA with Roboflow
Blog post from Roboflow
James Gallagher's article discusses the process of fine-tuning Microsoft's multimodal computer vision model, Florence-2, for visual question answering (VQA) using the Roboflow platform. The guide walks users through creating an image-text pairs dataset on Roboflow, setting up prefixes for labeling, and processing data for training. It explains how to utilize Roboflow Train to fine-tune the Florence-2 model in the cloud and later deploy the model on personal hardware through Roboflow Inference. Additionally, the article provides a step-by-step approach for training models, including the creation of dataset versions and labeling data with JSON payloads to enhance model performance in tasks like object detection and OCR. Once fine-tuned, the model can be deployed and run locally with Inference, showcasing its capability to identify specific information such as subtotals and totals in receipt data.