Real-time audio transcription API

Post Details

Company

Gladia

Date Published

Oct. 5, 2023

Author

-

Word Count

1,664

Company Posts That Month

4

Language

English

Hacker News Points

-

Source URL

www.gladia.io/blog/real-time-transcription-powered-by-whisper-asr

Summary

Gladia has launched a real-time audio transcription API that integrates speech recognition and generative AI to provide rapid transcription services, insights, and assistance for various applications, including contact centers and virtual meetings. The API supports over 100 languages and features custom vocabulary, named entity recognition, and sentiment analysis, achieving transcription latency as low as 300 milliseconds. The system utilizes a hybrid ASR/NLP model, leveraging OpenAI's Whisper ASR, which has been reengineered to support real-time transcription using WebSockets and Voice Activity Detection (VAD) technologies. This setup enables low-latency bidirectional communication and precise transcription, making it valuable for industries like customer support, healthcare, finance, and media. The API is designed for scalability and cost-effectiveness, with horizontal scaling and load-balancing strategies to manage high volumes of audio input efficiently. Users can access the API by creating an account on Gladia's platform, where they can find further documentation and support for implementation.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	28	2,496	566	185	+13%
Voice AI	1	121	30	15	-61%