Company
Date Published
Author
Hui Wen Goh, Jonas Mueller, Anish Athalye
Word count
1518
Language
English
Hacker News points
5

Summary

Cleanlab Studio automates the process of deploying machine learning (ML) models by detecting and correcting issues in the data, training a baseline model, identifying the best model for the dataset, retraining on the corrected data, and deploying it. The tool uses various AutoML systems and foundation models to learn about what doesn't look right in the dataset, and applies optimal combinations of large pretrained LLMs and fine-tuned Transformer networks for text datasets, CLIP/DINOv2 and fine-tuned computer vision networks for image datasets, and text models, neural architectures designed specifically for tabular data, and powerful tree ensembles like Gradient Boosting for tabular datasets. Users can quickly correct issues detected in their original dataset to improve its quality, retrain the model on the improved data, and deploy it with just a few clicks. Cleanlab Studio has been shown to outperform state-of-the-art models, including OpenAI Large Language Models, by improving the accuracy of deployed ML models, reducing errors by up to 28%, and making predictions quickly and at low costs. The tool is useful across many applications, beyond text datasets, and can handle arbitrary data types.