Fireworks has developed a platform for creating customizable, real-time voice agents that integrates automatic speech recognition (ASR), text-to-speech (TTS), and large language models (LLM) into a single, efficient solution. This approach addresses the challenges of traditional cascaded systems, such as latency, cost, and complexity, by co-locating components and using optimization techniques to achieve sub-500 millisecond response times. Fireworks' platform offers accurate ASR, capable of handling accents and background noise, and crisp TTS that allows for specific pronunciation and voice customization. The platform also supports advanced LLM capabilities for following complex instructions and integrating tool calls, offering a fully customizable end-to-end experience. Currently in beta, Fireworks invites users to explore its capabilities and provides a free, limited-access endpoint for demonstrations, encouraging potential design partners to collaborate on optimizing voice agent stacks.