Fireworks has enhanced its Whisper-based speech transcription service by introducing new features, including speaker diarization and a Batch API, to address customer demand for more sophisticated audio processing capabilities. The diarization feature identifies individual speakers in audio recordings, providing valuable insights for applications like meeting transcription and phone call analytics, while maintaining high scalability and accuracy. The Batch API allows users to process large volumes of audio files cost-effectively, offering a 40% reduction in price compared to typical APIs and returning results within 24 hours, making it ideal for use cases that do not require immediate responses. These improvements, combined with Fireworks’ existing audio services, enable the development of advanced AI applications for contact-center analytics, media indexing, and more, by integrating speech, text, and other modalities into a cohesive AI pipeline. Fireworks aims to provide a robust platform for building, customizing, and scaling AI systems with flexible deployment options and ongoing support through community channels like Discord and Twitter.