Will World Models Eat Physical AI?: What We Learned from Our Physical AI Panel
Blog post from Encord
Physical AI is advancing rapidly, with a growing focus on world models that predict environmental changes resulting from a robot's actions, as opposed to VLAs that predict only the robot's actions. This shift in the prediction paradigm allows the utilization of diverse data sources, including internet videos and failure data, which enrich the learning process. Recent advancements in AI-generated video quality, such as models like Cosmos and Genie 3, have made such data viable for training and broader applications, creating a reinforcing cycle of better models and data. The discussion highlighted the complexities of deploying physical AI, emphasizing the importance of the robotics data flywheel, which involves deploying robots, collecting data, and improving models iteratively. World models enhance this process by efficiently using robot-specific data and failure data, which are traditionally underutilized. The conversation also touched upon whether to develop generalist or specialist models, with the consensus leaning towards generalist models that can be fine-tuned for specific tasks. Deployment challenges are framed as systems issues requiring robust observability and performance monitoring tools, underscoring the importance of treating world models as integral infrastructure for data and evaluation loops in physical AI development.