Home / Companies / Stream / Blog / Post Details
Content Deep Dive

The End of the Orb: Building AI Agents That Feel Present

Blog post from Stream

Post Details
Company
Date Published
Author
Nash R.
Word Count
2,744
Language
English
Hacker News Points
-
Summary

A new open-source conversational agent has been developed to address the limitations of current voice agents, which often lack visual engagement and emotional awareness. This innovative agent uses Vision Agents for orchestration, Inworld's expressive TTS-2 for voice modulation, Anam for a lip-synced avatar, MediaPipe for face tracking, Gemini for the language model, and Deepgram for speech-to-text conversion, all operating in real-time over Stream's edge network. By integrating facial emotion, gaze, and engagement detection, the agent adapts its responses to reflect the user's emotional state, providing a more personal and interactive experience. This technology has potential applications in various fields, such as interview coaching, education, and customer support, where real-time emotional feedback can enhance the interaction. The system's modular design allows for flexibility and scalability, making it a versatile tool for developing emotionally intelligent agents that engage users more naturally and effectively.