Diversity Vs Density: A data strategy comparison for fine-tuning VLMs
Blog post from HuggingFace
Akhil Theerthala explores the effectiveness of two data curation strategies, diversity and density, for fine-tuning vision-language models (VLMs) in domains with limited image datasets. The diversity strategy, which involves using a wide range of images with associated questions, generally outperforms the density strategy, where multiple questions are asked about the same image. The study reveals that while diversity helps prevent overfitting and supports generalized reasoning, density may offer an efficient alternative when data resources are limited, especially for non-reasoning models. The controlled experiment using the GQA dataset shows that the diverse strategy provides consistent performance across various tasks, indicating its potential as a regularization method for VLMs. Despite the promising findings for diversity, the research also notes the potential of density under specific conditions, highlighting the need for further investigations into optimal data curation scales and the impact of synthetic diversity. The work underscores the importance of balancing these approaches based on specific project requirements and available resources.