Home / Companies / Baseten / Blog / Post Details
Content Deep Dive

The fastest Whisper — with streaming and diarization

Blog post from Baseten

Post Details
Company
Date Published
Author
Tianshu Cheng 4 others
Word Count
935
Language
English
Hacker News Points
-
Summary

Since 2024, the Whisper transcription service has been advancing in speed, accuracy, and cost-efficiency, with the latest improvements featuring real-time, speaker-aware transcription that is even more rapid and affordable. The service is engineered for flexible production applications, allowing customization for various use cases with or without streaming or diarization, and offering the ability to configure the number of GPUs used. Built on Baseten Chains, the Whisper transcription pipeline achieves significant cost savings and performance improvements over competitors, and now includes features like streaming audio transcription and speaker annotation for real-time applications. These advancements cater to industries requiring live note-taking, content captioning, customer support, and other voice-driven applications, and the system's diarization capability is particularly suited for speaker-aware conversational AI apps. The technology, which powers products like Notion's AI Meeting Notes, has been validated under heavy load, maintaining accuracy and cost-efficiency even with thousands of concurrent audio streams.