Building the Next Generation of Physical Agents with Gemini Robotics-ER 1.5
Blog post from Google Cloud
Gemini Robotics-ER 1.5, now available in preview via Google AI Studio and the Gemini API, is a pioneering model designed to enhance robotics with advanced embodied reasoning capabilities. This model excels in visual and spatial understanding, task planning, and progress estimation, making it adept at handling complex tasks that require contextual information and multiple steps, such as sorting objects into recycling bins based on local guidelines. It is optimized for rapid spatial reasoning, generating precise 2D points, and orchestrating advanced agentic behaviors through spatial and temporal reasoning. Users can control the latency versus accuracy trade-off, allowing the model to think longer for complex tasks or respond quickly for simpler ones. Additionally, Gemini Robotics-ER 1.5 includes improved semantic safety filters and physical constraint awareness, ensuring safer operation within defined parameters. As a high-level reasoning engine for robots, it integrates with various tools and APIs to execute sophisticated tasks, demonstrating significant performance on both academic and internal benchmarks.