Build a Vision AI Agent with Gemini 3 in < 3 Minutes

Post Details

Company

Stream

Date Published

Dec. 3, 2025

Author

Amos G.

Word Count

689

Company Posts That Month

32

Language

English

Hacker News Points

-

Source URL

getstream.io/blog/vision-agent-gemini-3

Summary

Vision Agents has introduced support for Google's Gemini 3 models within its open-source Python framework, enabling the creation of real-time voice and video AI applications. A short video demonstration showcases how to develop a vision-enabled voice agent capable of screen or webcam analysis, reasoning with Gemini 3 Pro Preview, and engaging in natural conversation using only Python. The process involves installing Vision Agents alongside the Gemini plugin, using the gemini-3-pro-preview as the LLM, and building a live video-call agent that can describe on-screen content in real time. Users are guided through setting up a project, installing necessary plugins, and modifying a Python script to create an AI agent that observes and responds to camera feed inputs. The framework facilitates interactive voice and video experiences with enhanced reasoning and multimodal understanding without needing complex frontend setups, encouraging users to explore its capabilities with minimal setup time.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	5	3,775	638	202	-32%
AI Agents	2	2,834	598	185	-18%
Real-time	2	7,285	1,202	224	+60%
Voice AI	1	552	97	35	-50%