Create Speech-to-Text Experiences with ElevenLabs Scribe v2 Realtime & Vision Agents

Post Details

Company

Stream

Date Published

Feb. 6, 2026

Author

Amos G.

Word Count

618

Company Posts That Month

22

Language

English

Hacker News Points

-

Source URL

getstream.io/blog/elevenlabs-scribe-v2-realtime

Summary

ElevenLabs has unveiled Scribe v2 Realtime, a speech-to-text model known for its impressively low latency of approximately 150 milliseconds and support for over 90 languages, boasting the lowest Word Error Rate in several benchmarks. This model is tailored for applications such as live meetings, note-taking, and conversational AI, where real-time accuracy is crucial. Scribe v2 Realtime can transcribe both user speech and agent responses in real-time, providing seamless conversations without noticeable lag. The model's setup involves a tech stack that includes ElevenLabs' solutions for STT and TTS, along with the Gemini LLM and Vision Agents framework, requiring API keys from ElevenLabs, Google AI Studio, and Stream. The open-source Vision Agents framework facilitates easy integration of Scribe v2 Realtime for applications needing precise live captioning and understanding, making it ideal for voice AI applications.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	14	5,046	1,089	214	+11%
LLM	5	5,138	781	181	+34%
Voice AI	3	2,174	187	45	+64%