Home / Companies / Roboflow / Blog / Post Details
Content Deep Dive

How to Fine-Tune a SmolVLM2 Model on a Custom Dataset

Blog post from Roboflow

Post Details
Company
Date Published
Author
James Gallagher
Word Count
1,486
Language
English
Hacker News Points
-
Summary

Released by Hugging Face in February 2025, SmolVLM2 is a multimodal image and video model designed for tasks such as visual question answering and structured information retrieval. The guide by James Gallagher details a comprehensive process for fine-tuning and deploying SmolVLM2 using Roboflow, focusing on transforming a shipping manifest into a JSON format. It begins with preparing and annotating a dataset, creating a dataset version, and then training the model. The fine-tuning process employs a pre-labeled shipping manifest dataset, and once the model is trained, it is deployed using Roboflow Workflows with a GPU Dedicated Deployment. The guide concludes by demonstrating how the model successfully interprets receipt data, returning it as a structured JSON object, and offers instructions for deploying the model both in the cloud and on personal hardware.