Raw Robot Video to VLA-Ready Training Data: Annotating LeRobot Datasets with Nomadic and HuggingFace Buckets
Blog post from HuggingFace
In the article, the process of transforming raw robotics video into richly annotated, VLA-training-ready data using the Nomadic platform and HuggingFace Buckets is detailed. The text highlights the importance of high-quality training data for robotic Vision-Language Agents (VLAs) and identifies common issues in community-contributed datasets, such as incomplete annotations and lack of temporal detail. Nomadic addresses these challenges by offering tools for detailed timestamping, accurate object identification, and scene segmentation, which are critical for precise robotics training. HuggingFace Buckets provides a storage solution that integrates seamlessly with the Nomadic platform, enabling efficient data management and accessibility for large volumes of robotics video. This integration allows for better standardization and curation of datasets, facilitating multi-dataset training and enhancing the overall training quality of VLAs. Ultimately, the collaboration between data collection, storage, and annotation platforms seeks to advance the capabilities and accuracy of robotic training systems.