GPT Realtime 2 Is Here — And Preambles Change How Voice Agents Feel
Blog post from Agora
OpenAI's GPT Realtime 2 is the latest advancement in speech-to-speech models, enhancing instruction following, multilingual capabilities, and introducing a novel feature called "preambles." These preambles are short acknowledgments emitted during the model's reasoning phase, effectively reducing perceived response time by maintaining continuous interaction with the user, a departure from the silent pauses that previously characterized such processes. This approach not only improves user experience by indicating ongoing processing but also allows for multilingual transitions and better handling of complex queries without the need for immediate tool calls. Realtime 2 also offers increased expressiveness and steerability, adapting its tone to match the user's input, and boasts a longer context window for sustained interactions, making it a robust choice for production-grade voice agents. Agora integrates this technology into its Conversational AI Engine, facilitating low-latency, reasoning-first voice applications for real-time communication.