Building a fully reproducible machine learning pipeline with Comet.ml and Quilt

Post Details

Company

Comet

Date Published

May 13, 2019

Author

Gideon Mendels

Word Count

2,096

Language

English

Hacker News Points

-

Source URL

www.comet.ml/site/building-a-fully-reproducible-machine-learning-pipeline-with-comet-ml-and-quilt

Summary

The tutorial outlines the process of building a reproducible end-to-end machine learning pipeline for fruit classification using a Keras multi-class image classification model and a custom dataset from Google Open Images, managed with Quilt T4 and Comet.ml. The process begins with creating a targeted dataset by selecting specific fruit images from the extensive Open Images Dataset and involves preprocessing these images to address class imbalance, particularly the over-representation of certain fruits like bananas. The tutorial then explores constructing a baseline convolutional neural network (CNN) model and progresses to utilizing a pre-trained network, InceptionV3, for transfer learning to improve classification accuracy. Comet.ml is employed for tracking experiments, logging results, and ensuring reproducibility by capturing metrics, model details, and environmental settings. The guide emphasizes the iterative nature of machine learning pipelines, the importance of versioning data and models, and the benefits of sharing and reproducing machine learning experiments using both data and model versioning tools.