Gemini Live API & Lyria 3: Generate Music From Text, Phone & Video Calls
Blog post from Stream
Google DeepMind's Lyria 3 is an AI tool designed to generate music using multimodal prompts, such as text, images, and voice, through the Gemini API. It supports the creation of both short 30-second clips and full-length songs by analyzing input prompts. The Lyria 3 model, available in the Gemini API, accommodates various use cases, whether it is for soundtracks, ambient tracks, or cinematic pieces. Integrating with Vision Agents allows users to generate music during video or phone calls via Twilio, providing real-time agentic voice output. The setup involves configuring several tech stacks, including NGROK for URL conversion, and requires API keys for operation. This tool represents a versatile approach to AI music generation, offering users the ability to customize output through creative prompt crafting and environmental setup.