Company
Date Published
Author
Caleb Chiam, Luke Mainwaring, Yiming Chen
Word count
2388
Language
English
Hacker News points
None

Summary

The Cleanlab Vizzy is an interactive visualization of confident learning, a data-centric AI family of theory and algorithms for automatically identifying and correcting label errors in datasets. The algorithm involves generating out-of-sample predicted probabilities for all datapoints in a dataset, computing percentile thresholds for each label class based on these probabilities, and using these thresholds to distinguish between datapoints that are likely or unlikely to have a given label. The Cleanlab approach addresses the shortcomings of naive approaches by taking into account the model's confidence in its predictions and distinguishing between images that are out of distribution and those where the model is unconfident but still has sufficient grounds to make a determination. The visualization uses React, Typescript, and Chakra UI to display the algorithm's results on an image dataset, with sliders allowing users to adjust the class and out-of-distribution thresholds. The code for the visualization is open source, and the Cleanlab package provides additional algorithms that can automatically identify examples that are outliers or out of distribution.