Home / Companies / AssemblyAI / Blog / Post Details
Content Deep Dive

Review - JUST: Joint Unsupervised and Supervised Training For Multilingual ASR

Blog post from AssemblyAI

Post Details
Company
Date Published
Author
Luka Chkhetiani
Word Count
717
Company Posts That Month
16
Language
English
Hacker News Points
-
Summary

The paper "JUST - JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR MULTILINGUAL ASR" presents a novel Wav2Vec2-inspired pre-training technique for multilingual automatic speech recognition (ASR). JUST utilizes a five-stage modeling architecture with three stage-level unsupervised and supervised loss functions. The proposed approach achieves a 32% performance increase over the first-stage Wav2Vec2 XLSR network in low-resource language ASR settings. Key findings include the use of contrastive MLM (Masked Language Modelling) and RNN-T losses for joint pre-training on audio-text pairs across multiple languages, leading to more useful information extraction, better generalization, and robust contextualized token prediction. JUST outperforms Wav2Vec2 by using only the MLS dataset for pre-training, demonstrating its effectiveness in multilingual ASR tasks with fewer data requirements.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
AI Model Fine-tuning 5 No monthly metrics for this publish month.