Flux Just Got A Little Smarter
Blog post from Deepgram
Flux has undergone an enhancement through a new training paradigm, improving transcription accuracy and reducing false positives, particularly in start-of-turn detection. Unlike many speech-to-text systems that finalize transcriptions based on wall clock or pause time, Flux uses conversation time for finalization, offering low latency end-of-turn detection. This approach allows for an immediate high-quality transcript once a conversational turn ends. The newer version, Flux V0.1, adopts a more conservative transcription approach, optimizing accuracy specifically at the end of a turn, which leads to a 70% reduction in false positives and faster end-of-turn detection. While the model still revises transcripts throughout a turn, it is less likely to output incorrect words prematurely compared to its predecessor. This conservativeness also translates to improved transcription quality, showing significant gains in accuracy on various data sets, including a notable 10% improvement on a Common Voice test set. These advancements allow developers to benefit from the improved performance without altering their existing implementations, as the update has already been applied.