How To Train and Deploy Reliable Models on Messy Real-World Data With a Few Clicks

Company

Cleanlab

Date Published

July 24, 2023

Author

Hui Wen Goh, Jonas Mueller, Anish Athalye

Word count

1518

Language

English

Hacker News points

URL

cleanlab.ai/blog/model-deployment

Summary

Cleanlab Studio automates the process of deploying machine learning (ML) models by detecting and correcting issues in the data, training a baseline model, identifying the best model for the dataset, retraining on the corrected data, and deploying it. The tool uses various AutoML systems and foundation models to learn about what doesn't look right in the dataset, and applies optimal combinations of large pretrained LLMs and fine-tuned Transformer networks for text datasets, CLIP/DINOv2 and fine-tuned computer vision networks for image datasets, and text models, neural architectures designed specifically for tabular data, and powerful tree ensembles like Gradient Boosting for tabular datasets. Users can quickly correct issues detected in their original dataset to improve its quality, retrain the model on the improved data, and deploy it with just a few clicks. Cleanlab Studio has been shown to outperform state-of-the-art models, including OpenAI Large Language Models, by improving the accuracy of deployed ML models, reducing errors by up to 28%, and making predictions quickly and at low costs. The tool is useful across many applications, beyond text datasets, and can handle arbitrary data types.