Home / Companies / Bright Data / Blog / Post Details
Content Deep Dive

VLAs and World Models Need Web-Scale Data. Just Not the Same Data

Blog post from Bright Data

Post Details
Company
Date Published
Author
Adam Chan
Word Count
1,755
Company Posts That Month
19
Language
English
Hacker News Points
-
Summary

A recent event at Bright Data’s Web Data Loft in San Francisco brought together engineers from leading robotics and AI companies to explore the transition from language models to real-world robotic applications. The discussion, moderated by Adam of HackerSquad and the Builders Collective, highlighted the significance of training corpora in developing Vision-Language-Action (VLA) models, emphasizing that model architecture is not the sole bottleneck. VLAs begin as vision-language models trained on large-scale internet data before being fine-tuned with robotic data, allowing for better generalization. The conversation also covered the integration of vision, language, and action into a unified token space, the distinct data needs of VLAs versus world models, and the hierarchy of data sources for training. A consensus emerged that broad web-scale data offers foundational world understanding, while robot-specific data is crucial for execution. Participants noted the absence of reliable scaling laws for robotics akin to those in LLMs, underscoring the importance of rapid data curation and discovery for effective model training.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
LLM 9 5,172 1,006 220 -43%
AI Agents 2 4,874 1,103 240 -1%
Data Pipeline 1 441 203 86 -29%
Reinforcement learning 1 59 31 19 -34%