AI Model Biases: What went wrong with Whisper by OpenAI?

Company

Gladia

Date Published

Sept. 1, 2024

Author

Word count

1148

Language

English

Hacker News points

None

URL

www.gladia.io/blog/ai-model-biases-what-went-wrong-with-whisper-by-openai

Summary

OpenAI's Whisper Large-v3 model, intended to enhance multilingual speech-to-text capabilities, faces significant challenges related to training biases and the limitations of available annotated data. Despite being marketed as a solution for low-resource languages, the model struggles with issues such as hallucinations, degraded punctuation, and unreliable accuracy in underrepresented languages, largely due to its training on datasets sourced from platforms like YouTube. These biases are magnified when the model is fine-tuned using AI-generated annotations, particularly affecting non-English languages. Moreover, the model's performance metrics, such as word error rate (WER), can be misleading as they fail to account for real-world audio complexities and biases related to gender, age, and prosodic diversity. While Whisper remains a leading speech recognition tool, its development highlights the broader challenges of creating inclusive AI systems.