Vividh-ASR: Diagnosing and Fixing Studio-Bias in Whisper for Indic Languages
Blog post from HuggingFace
Vividh-ASR addresses studio-bias in existing ASR models for Indic languages by introducing a benchmark stratified by acoustic complexity and a Whisper fine-tuning recipe that enhances model robustness across various conditions. The key discovery is that fine-tuning Whisper with a high learning rate significantly outperforms existing models, especially for Hindi and Malayalam, without needing architectural changes or proprietary data. The study reveals that training on harder conditions first benefits the performance for Malayalam, while for Hindi, the high learning rate alone suffices. This advancement enables a 244M parameter Whisper model to surpass larger models on overall Word Error Rate (WER). Adalat AI focuses on developing robust ASR models to serve the Indian judiciary, tackling challenges related to spontaneous and varied-condition audio as well as operational efficiency for large-scale concurrent use. The research challenges standard fine-tuning assumptions, suggesting that a high learning rate and reverse curriculum learning (hard-to-easy) are more beneficial for certain languages, although the curriculum order's impact varies between Hindi and Malayalam. The release includes models and benchmarks that significantly outperform existing baselines, providing new insights for practitioners working on low-resource Indic languages.