Flux Just Got A Little Smarter
Blog post from Deepgram
Flux has undergone an enhancement through a new training paradigm, improving transcription accuracy and reducing false positives, particularly in start-of-turn detection. Unlike many speech-to-text systems that finalize transcriptions based on wall clock or pause time, Flux uses conversation time for finalization, offering low latency end-of-turn detection. This approach allows for an immediate high-quality transcript once a conversational turn ends. The newer version, Flux V0.1, adopts a more conservative transcription approach, optimizing accuracy specifically at the end of a turn, which leads to a 70% reduction in false positives and faster end-of-turn detection. While the model still revises transcripts throughout a turn, it is less likely to output incorrect words prematurely compared to its predecessor. This conservativeness also translates to improved transcription quality, showing significant gains in accuracy on various data sets, including a notable 10% improvement on a Common Voice test set. These advancements allow developers to benefit from the improved performance without altering their existing implementations, as the update has already been applied.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Real-time | 4 | 4,546 | 943 | 215 | -38% |
| Voice AI | 3 | 1,325 | 172 | 39 | +140% |
| AI Model Fine-tuning | 1 | 532 | 129 | 59 | -12% |
| LLM | 1 | 3,836 | 662 | 193 | +2% |