Home / Companies / ElevenLabs / Blog / Post Details
Content Deep Dive

Talk to a Statue: Building A Multi-Modal ElevenAgents-Powered App

Blog post from ElevenLabs

Post Details
Company
Date Published
Author
Creative Platform
Word Count
2,104
Language
English
Hacker News Points
-
Summary

In the blog post, a multi-modal app is described that allows users to photograph statues and have real-time voice conversations with the depicted figures using ElevenLabs' Voice Design and Agent APIs. The app combines computer vision and voice generation to create interactive experiences with public monuments. The process involves capturing an image, identifying the artwork and its characters using an OpenAI model, researching historical context, generating unique voices for each character using the ElevenLabs API, and facilitating voice interactions via WebRTC. The system extracts details such as artwork name, location, artist, and detailed voice descriptions for accurate voice synthesis, which enhances the realism of the experience. The app aims to be both fun and educational, demonstrating the potential of combining different AI modalities for creative and informative applications.