/plushcap/analysis/assemblyai/review-just

Review - JUST: Joint Unsupervised and Supervised Training For Multilingual ASR

What's this blog post about?

The paper "JUST - JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR MULTILINGUAL ASR" presents a novel Wav2Vec2-inspired pre-training technique for multilingual automatic speech recognition (ASR). JUST utilizes a five-stage modeling architecture with three stage-level unsupervised and supervised loss functions. The proposed approach achieves a 32% performance increase over the first-stage Wav2Vec2 XLSR network in low-resource language ASR settings. Key findings include the use of contrastive MLM (Masked Language Modelling) and RNN-T losses for joint pre-training on audio-text pairs across multiple languages, leading to more useful information extraction, better generalization, and robust contextualized token prediction. JUST outperforms Wav2Vec2 by using only the MLS dataset for pre-training, demonstrating its effectiveness in multilingual ASR tasks with fewer data requirements.

Company
AssemblyAI

Date published
Dec. 15, 2021

Author(s)
Luka Chkhetiani

Word count
717

Hacker News points
None found.

Language
English