Over the past year, an individual at Agora has been exploring the development of voice AI agents, encountering various challenges and insights along the way. The process involved integrating Agora's real-time voice infrastructure with OpenAI’s Realtime API and ElevenLabs Agents, leading to the creation of projects ranging from an AI companion for kids to a food-ordering assistant. A significant challenge was the need for maintaining session persistence, which led to the creation of a custom load balancer using Redis. Another major discovery was the discrepancy between voice output and text transcription, highlighting the need for accurate auditing. The introduction of Agora's Conversational AI Engine simplified the development process, enabling a shift towards more complex agent functionalities through cascading architectures and function calls. The exploration extended to building multi-agent systems capable of real-world tasks, which underscored the importance of using specific communication protocols like UDP over WebSockets for better performance. The experience also revealed that while off-the-shelf models can be limiting, multi-agent systems with focused roles and robust prompts can significantly enhance capabilities, albeit with new challenges like managing data for Retrieval-Augmented Generation. The journey underscored that building, testing, and refining are crucial in navigating the evolving landscape of voice AI technology.