20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

Post Details

Company

Fireworks AI

Date Published

Oct. 6, 2025

Author

-

Word Count

1,346

Language

English

Hacker News Points

-

Source URL

fireworks.ai/blog/audio-transcription-launch

Summary

Fireworks has launched the beta release of its speech-to-text APIs, utilizing Whisper v3-large models, which offer significant speed and cost improvements in audio transcription and translation. These APIs can transcribe one hour of audio in just four seconds, providing a low-latency experience crucial for engaging audio applications. The Fireworks Audio API includes features like transcription alignment, voice activity detection, and audio preprocessing, supporting use cases such as video captioning, speech model training, and podcast editing. The company offers two deployment methods: serverless and dedicated endpoints, with the latter providing greater scalability and production-readiness. Fireworks emphasizes the growing importance of multi-modal, audio-driven AI, showcasing compound AI systems that integrate audio with other modalities to create enriched user experiences. The service is currently free for two weeks, allowing users to explore its capabilities through a UI playground or code experimentation, with options for dedicated endpoints available for optimized performance.