Company
Date Published
Author
-
Word count
1346
Language
English
Hacker News points
None

Summary

Fireworks has launched the beta release of its speech-to-text APIs, utilizing Whisper v3-large models, which offer significant speed and cost improvements in audio transcription and translation. These APIs can transcribe one hour of audio in just four seconds, providing a low-latency experience crucial for engaging audio applications. The Fireworks Audio API includes features like transcription alignment, voice activity detection, and audio preprocessing, supporting use cases such as video captioning, speech model training, and podcast editing. The company offers two deployment methods: serverless and dedicated endpoints, with the latter providing greater scalability and production-readiness. Fireworks emphasizes the growing importance of multi-modal, audio-driven AI, showcasing compound AI systems that integrate audio with other modalities to create enriched user experiences. The service is currently free for two weeks, allowing users to explore its capabilities through a UI playground or code experimentation, with options for dedicated endpoints available for optimized performance.