Large Vocabulary Speech Recognition Demystified
Blog post from Deepgram
Large vocabulary speech recognition (LVSR) in production environments faces significant challenges due to the density of out-of-vocabulary (OOV) terms rather than a fixed dictionary size, often leading to transcription errors with specialized terms such as drug names, product codes, and legal jargon. Keyterm Prompting offers a solution for small, stable term sets by adjusting model decoding to favor specific terms, providing immediate gains without retraining, but has limitations when lists become too large or ambiguous, increasing the risk of force-fitting errors. Custom model training, which integrates domain vocabulary into the model's learned representations, is recommended when these limits are reached, offering a more robust solution with potential for significant accuracy improvements, albeit with a requirement for audio data and a longer timeline. The decision between Keyterm Prompting and custom training should be guided by the size and specificity of the domain vocabulary, as well as operational constraints, ensuring the right approach is taken to address the unique vocabulary challenges of each deployment.