How Together AI built the world’s fastest speech-to-text stack

Post Details

Company

Together AI

Date Published

May 29, 2026

Author

Together AI

Word Count

1,646

Company Posts That Month

8

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.together.ai/blog/how-together-ai-built-the-worlds-fastest-speech-to-text-stack

Summary

Artificial Analysis highlights the complexities of serving Automatic Speech Recognition (ASR) systems, focusing on the challenges of processing audio data compared to text. While text inputs are compact and ready for inference, audio data requires extensive preprocessing before reaching the GPU, making ASR a full-path systems problem. The piece examines NVIDIA’s Parakeet-TDT 0.6B v3 and OpenAI’s Whisper Large v3 models, emphasizing the need for efficient GPU execution, CPU preprocessing, and memory management. Key optimizations include using TensorRT for encoder execution, conditional CUDA graphs to streamline decoder operations, and reducing CPU-path overhead with shared memory and evented I/O. The importance of controlling both median and tail latency in voice systems is underscored, as ASR latency sets the earliest bound on user-visible response time. Parakeet v3, noted for its expanded language support and training on a vast multilingual corpus, showcases advancements in ASR technology, demonstrating significant improvements over its predecessor in language support and model efficiency.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	9	5,735	1,391	247	-9%
LLM	2	9,074	1,640	224	+53%
Voice AI	2	3,462	242	43	+46%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.