py123d + FiftyOne: Explore Autonomous Driving Datasets
Blog post from Voxel51
Autonomous driving research faces challenges due to fragmented datasets, each with its own formats and conventions, making cross-dataset training cumbersome. The open-source library py123d addresses this by converting data from various autonomous vehicle datasets into a unified Apache Arrow format, allowing seamless access to cameras, lidar, HD maps, and labels with a single API. It utilizes efficient memory-mapped reads and supports multiple sensor codecs without duplicating storage. Further enhancing this process, the FiftyOne library provides an interactive platform for exploring and comparing datasets, offering features such as label filtering, dataset-scale querying, and cross-dataset comparisons, all within a browser interface. Together, py123d and FiftyOne streamline the ingestion and understanding of autonomous driving data, allowing researchers to efficiently manage, visualize, and analyze large datasets across different sources.