Keyterm prompting for real-time accuracy: boosting names, jargon, and product terms
Blog post from AssemblyAI
Keyterm prompting is a method used in real-time speech-to-text applications to enhance the accuracy of transcribing specific, high-value words such as names, jargon, and product terms that are often missed by generic models. This technique involves providing a list of important words and phrases to the model via the keyterms_prompt parameter, which biases the model towards recognizing them during transcription. It operates through two stages: word-level boosting, which occurs live during audio input, and turn-level boosting, which employs phonetic matching to correct sound-alike errors. This approach, particularly effective with AssemblyAI's Universal-3.5 Pro Realtime model, allows for dynamic updates throughout a conversation, improving transcription accuracy even for accented speech. However, overloading the keyterms list can lead to overcorrections, so it is advised to maintain a focused and concise list of genuinely critical terms. The integration of keyterm prompting with other features like conversation and agent context supports the development of precise and reliable speech recognition systems, especially in environments where specific vocabulary is crucial.