How we built Cleanlab Vizzy

Post Details

Company

Cleanlab

Date Published

Aug. 17, 2022

Author

Caleb Chiam, Luke Mainwaring, Yiming Chen

Word Count

2,388

Language

English

Hacker News Points

-

Source URL

cleanlab.ai/blog/cleanlab-vizzy

Summary

The Cleanlab Vizzy is an interactive visualization of confident learning, a data-centric AI family of theory and algorithms for automatically identifying and correcting label errors in datasets. The algorithm involves generating out-of-sample predicted probabilities for all datapoints in a dataset, computing percentile thresholds for each label class based on these probabilities, and using these thresholds to distinguish between datapoints that are likely or unlikely to have a given label. The Cleanlab approach addresses the shortcomings of naive approaches by taking into account the model's confidence in its predictions and distinguishing between images that are out of distribution and those where the model is unconfident but still has sufficient grounds to make a determination. The visualization uses React, Typescript, and Chakra UI to display the algorithm's results on an image dataset, with sliders allowing users to adjust the class and out-of-distribution thresholds. The code for the visualization is open source, and the Cleanlab package provides additional algorithms that can automatically identify examples that are outliers or out of distribution.