SynthVision: Building a 110K Synthetic Medical VQA Dataset with Cross-Model Validation
Blog post from HuggingFace
SynthVision is a collaborative project between OpenMed, Hugging Face, and Doubleword, which created a synthetic medical Visual Question Answering (VQA) dataset of 110,000 records using 119,000 annotated medical images. The dataset, built with two vision-language models (Qwen 3.5 and Kimi K2.5), achieved a 93% cross-validation agreement and was developed for under $500. The initiative aims to address the limited size and scope of existing medical VQA datasets, such as VQA-RAD, by transferring knowledge from large models to smaller ones through knowledge distillation. The project involved using Doubleword's API for efficient batch annotation and cross-validation, leading to fine-tuning of three small models (2-3 billion parameters) that improved performance across benchmarks, with the best model showing a 15% average exact match improvement. All data, code, and models have been open-sourced to encourage reproducibility and further research in the medical AI community.