Home / Companies / Stream / Blog / Post Details
Content Deep Dive

Gemini Live API & Lyria 3: Generate Music From Text, Phone & Video Calls

Blog post from Stream

Post Details
Company
Date Published
Author
Amos G.
Word Count
4,437
Language
English
Hacker News Points
-
Summary

Google DeepMind's Lyria 3 is an AI tool designed to generate music using multimodal prompts, such as text, images, and voice, through the Gemini API. It supports the creation of both short 30-second clips and full-length songs by analyzing input prompts. The Lyria 3 model, available in the Gemini API, accommodates various use cases, whether it is for soundtracks, ambient tracks, or cinematic pieces. Integrating with Vision Agents allows users to generate music during video or phone calls via Twilio, providing real-time agentic voice output. The setup involves configuring several tech stacks, including NGROK for URL conversion, and requires API keys for operation. This tool represents a versatile approach to AI music generation, offering users the ability to customize output through creative prompt crafting and environmental setup.