How to Use Florence-2 for Optical Character Recognition
Blog post from Roboflow
Microsoft's Florence-2, released in June 2024, is a versatile multimodal vision model capable of tasks such as image captioning, object detection, and Optical Character Recognition (OCR). It is licensed under the MIT license, allowing for commercial use, and operates efficiently with model weights of 1.54 GB. The model demonstrates significant OCR capabilities, quickly transcribing handwritten text with notable accuracy. Users can opt for two OCR modes: one that retrieves all text in an image as a single string and another that provides text localization with bounding boxes. The guide offers a detailed walkthrough on employing Florence-2 for OCR using the Hugging Face Transformers library, explaining how to set up dependencies, run inference, and visualize text regions in images. Despite its efficiency, the model may require image pre-processing to achieve optimal results, such as dividing an image into sections to ensure complete text recognition.