Home / Companies / Galileo / Blog / Post Details
Content Deep Dive

🔭 Improving Your ML Datasets With Galileo, Part 1

Blog post from Galileo

Post Details
Company
Date Published
Author
Ben Epstein
Word Count
1,423
Language
English
Hacker News Points
-
Summary

Galileo tackles data quality issues by analyzing various benchmark datasets in academia/industry using its platform, highlighting crucial errors and ambiguities within minutes. By inspecting a dataset like the 20 Newsgroups classification task, Galileo identifies 6.5% of malformed samples across the dataset, including empty or ill-formed samples that increase confusion during training. Using Galileo's Data Error Potential (DEP) Score, the platform quickly uncovers data errors that are otherwise found through ad-hoc exploration, enabling rapid discovery and fixing of these issues. By addressing these dataset errors, model performance improves, with a 7.24% overall performance improvement in this experiment, highlighting the importance of ML Data Intelligence to solve for necessary steps in the ML lifecycle.