Multilingual speech-to-text on your laptop: NVIDIA's Nemotron 3.5 ASR

Post Details

Company

LiveKit

Date Published

June 5, 2026

Author

Shayne Parmelee

Word Count

1,651

Company Posts That Month

11

Language

English

Hacker News Points

-

Post removed?

No

Source URL

livekit.com/blog/nemotron-3.5-asr-multilingual-teleprompter

Summary

NVIDIA's Nemotron 3.5 ASR is a 600 million-parameter streaming speech recognition model capable of transcribing 40 language-locales with remarkable speed and efficiency, suitable for running on a laptop. The model employs a language-ID prompt to direct decoding, allowing a single set of weights to handle multiple languages such as English, Spanish, and Japanese, with a sub-100ms end-of-utterance latency that ensures transcripts keep pace with spoken words. This post explores its applications, particularly its integration in NeMo, OpenAI-compatible servers, and LiveKit voice agents, highlighting its real-time processing capabilities and local execution on devices like CPUs and Apple Silicon, thus eliminating cloud dependency and reducing costs. A key feature is its multilingual teleprompter functionality, where the model's streaming and low-latency attributes enable the script to scroll in sync with the user's voice, further enhanced by a clever matching algorithm that maintains accuracy and responsiveness. This model stands out in the multilingual streaming space for its speed and local execution, making it a unique solution in speech recognition technology.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	11	5,601	1,340	262	-2%
Voice AI	2	3,084	268	57	-11%
LLM	1	6,196	1,155	243	-32%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.