Cooking up machine learning models: A deep dive into the supervised learning pipeline

Post Details

Company

Elastic

Date Published

Feb. 21, 2023

Author

Valeriy Khakhutskyy

Word Count

949

Language

English

Hacker News Points

-

Source URL

www.elastic.co/blog/machine-learning-models-supervised-learning-pipeline

Summary

Building a machine learning model in Elastic Stack involves a structured supervised learning pipeline akin to a cooking process, where precise steps and creativity are essential. The process commences with data preprocessing, which includes reindexing and dividing the data into training and test sets using problem-dependent sampling methods. Feature selection follows, where dependencies between features and target values are estimated using methods like the maximum information coefficient (MIC) and minimum redundancy maximum relevancy (mRMR), alongside encoding techniques. Hyperparameter optimization is performed through a two-phase process of coarse and fine-tuning, utilizing techniques such as Bayesian optimization to find optimal configurations. The final training phase employs these optimized parameters to train the model, resulting in efficient and accurate predictions, which are stored in Elasticsearch indices. The inference phase evaluates the test set, storing results for further analysis, while the platform's capabilities allow users with limited machine learning expertise to develop robust models.