Exploring the TACO Dataset [Data Analysis]

Post Details

Company

Encord

Date Published

Jan. 11, 2023

Author

Görkem Polat

Word Count

3,040

Language

English

Hacker News Points

-

Source URL

encord.com/blog/taco-dataset-guide

Summary

The project aims to analyze the Trash Annotations in Context (TACO) dataset, which contains images of litter objects on different backgrounds. The authors use Encord Active, a platform for data analysis and model training, to pre-process the dataset, train a Mask-RCNN model, and evaluate its performance. They find that object area, frame object density, and object count have the highest impact on performance, and most objects are very small. The annotation quality is significantly worse in the unofficial dataset compared to the official one. The model performs well on larger objects but struggles with small undefined objects. The authors use Encord Active to visualize true positives and false positive samples, which helps them identify a class mismatch problem that can be addressed by improving data quality or adding specific post-processing steps to the inference pipeline.