Home / Companies / Roboflow / Blog / Post Details
Content Deep Dive

A popular self-driving car dataset is missing labels for hundreds of pedestrians

Blog post from Roboflow

Post Details
Company
Date Published
Author
Brad Dwyer
Word Count
417
Language
English
Hacker News Points
-
Summary

Machine learning, particularly in the realm of self-driving cars, poses significant societal benefits and risks, with the quality of training data being crucial for the safety and reliability of these systems. A concerning discovery was made regarding a widely-used open-source dataset for self-driving cars, known as Udacity Dataset 2, which contained substantial errors and omissions, including unlabeled vehicles, pedestrians, cyclists, and other critical objects, affecting 33% of its 15,000 images. These inaccuracies highlight the potential dangers of relying on flawed datasets for training machine learning models and underscore the responsibility of the data community to ensure the integrity and completeness of shared data, especially when public safety is at stake. Consequently, efforts were made to correct and re-release the dataset with accurate annotations, urging users to adopt these updated versions to enhance the reliability of their projects.