Vision Agents v0.3: Deployments, HTTP Support, & 10 New Plugins
Blog post from Stream
Vision Agents v0.3 represents a major advancement in the deployment of multimodal AI agents for production-scale applications, building upon the foundation established in v0.2. This release introduces comprehensive infrastructure improvements, including HTTP APIs, observability, session management, and real-world integrations, enabling the deployment of AI agents for various tasks such as customer support and yoga instruction. The update includes 10 new plugins, enhancing capabilities with integrations from AWS, NVIDIA, and HuggingFace, alongside powerful phone integration and retrieval-augmented generation (RAG) capabilities for handling customer interactions via Twilio and Turbopuffer. v0.3 supports the deployment of agents as HTTP services using FastAPI, with built-in REST endpoints, session management, and observability tools like Prometheus for monitoring LLM, STT, TTS, and other metrics. The release also showcases applications in security, with examples of real-time face recognition and package detection, and offers a suite of tools from Gemini for expanding agent functionalities beyond simple Q&A. Vision Agents v0.3 is open-source, encouraging community involvement and contributions, and offers a streamlined process for getting started with examples and deployment guidance.