Open Vision Agents by Stream: Open Source SDK for Building Low-Latency Vision AI Apps

Post Details

Company

Stream

Date Published

Oct. 10, 2025

Author

Thierry S.

Word Count

979

Company Posts That Month

18

Language

English

Hacker News Points

-

Source URL

getstream.io/blog/vision-agents-by-stream

Summary

Vision Agents is an open-source framework developed by Stream that facilitates the creation of low-latency vision AI applications, with a focus on real-time voice and video models such as OpenAI Realtime and Gemini Live. It offers a simple integration process through a generic Agent class that manages the complexities of tracks, video subscriptions, and response type conversions. The framework supports various models including text-to-speech, speech-to-text, and speech-to-speech, allowing developers to incorporate their preferred language learning models (LLMs). Vision Agents is built video-first, prioritizing real-time video processing via WebRTC, and provides customizable video processors for tasks like pose detection and anomaly detection in manufacturing. The framework supports diverse applications such as sports coaching, meeting assistance, and accessibility features, while also enabling integration with robotics and IoT. Its design allows for natural interactions by combining visual and auditory data processing, and offers built-in memory and context retention across sessions. The project encourages community involvement and collaboration with AI companies to expand its support for various AI models and services.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	13	6,551	1,245	236	+61%
LLM	11	4,863	783	205	+34%
AI Agents	1	3,102	615	183	+29%
Voice AI	1	971	139	44	+45%