Talk to a Statue: Building A Multi-Modal ElevenAgents-Powered App

Post Details

Company

ElevenLabs

Date Published

Feb. 18, 2026

Author

Creative Platform

Word Count

2,104

Language

English

Hacker News Points

-

Source URL

elevenlabs.io/blog/talk-to-a-statue-building-a-multi-modal-elevenagents-powered-app

Summary

In the blog post, a multi-modal app is described that allows users to photograph statues and have real-time voice conversations with the depicted figures using ElevenLabs' Voice Design and Agent APIs. The app combines computer vision and voice generation to create interactive experiences with public monuments. The process involves capturing an image, identifying the artwork and its characters using an OpenAI model, researching historical context, generating unique voices for each character using the ElevenLabs API, and facilitating voice interactions via WebRTC. The system extracts details such as artwork name, location, artist, and detailed voice descriptions for accurate voice synthesis, which enhances the realism of the experience. The app aims to be both fun and educational, demonstrating the potential of combining different AI modalities for creative and informative applications.