The fastest Whisper — with streaming and diarization
Blog post from Baseten
Since 2024, the Whisper transcription service has been advancing in speed, accuracy, and cost-efficiency, with the latest improvements featuring real-time, speaker-aware transcription that is even more rapid and affordable. The service is engineered for flexible production applications, allowing customization for various use cases with or without streaming or diarization, and offering the ability to configure the number of GPUs used. Built on Baseten Chains, the Whisper transcription pipeline achieves significant cost savings and performance improvements over competitors, and now includes features like streaming audio transcription and speaker annotation for real-time applications. These advancements cater to industries requiring live note-taking, content captioning, customer support, and other voice-driven applications, and the system's diarization capability is particularly suited for speaker-aware conversational AI apps. The technology, which powers products like Notion's AI Meeting Notes, has been validated under heavy load, maintaining accuracy and cost-efficiency even with thousands of concurrent audio streams.