Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

SynthVision: Building a 110K Synthetic Medical VQA Dataset with Cross-Model Validation

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Maziyar Panahi, merve, Jamie@Doubleword, Josh, Seb Ringrose, and Fergus Finn
Word Count
3,730
Language
-
Hacker News Points
-
Summary

SynthVision is a collaborative project between OpenMed, Hugging Face, and Doubleword, which created a synthetic medical Visual Question Answering (VQA) dataset of 110,000 records using 119,000 annotated medical images. The dataset, built with two vision-language models (Qwen 3.5 and Kimi K2.5), achieved a 93% cross-validation agreement and was developed for under $500. The initiative aims to address the limited size and scope of existing medical VQA datasets, such as VQA-RAD, by transferring knowledge from large models to smaller ones through knowledge distillation. The project involved using Doubleword's API for efficient batch annotation and cross-validation, leading to fine-tuning of three small models (2-3 billion parameters) that improved performance across benchmarks, with the best model showing a 15% average exact match improvement. All data, code, and models have been open-sourced to encourage reproducibility and further research in the medical AI community.