Cooking up machine learning models: A deep dive into the supervised learning pipeline
Blog post from Elastic
Building a machine learning model in Elastic Stack involves a structured supervised learning pipeline akin to a cooking process, where precise steps and creativity are essential. The process commences with data preprocessing, which includes reindexing and dividing the data into training and test sets using problem-dependent sampling methods. Feature selection follows, where dependencies between features and target values are estimated using methods like the maximum information coefficient (MIC) and minimum redundancy maximum relevancy (mRMR), alongside encoding techniques. Hyperparameter optimization is performed through a two-phase process of coarse and fine-tuning, utilizing techniques such as Bayesian optimization to find optimal configurations. The final training phase employs these optimized parameters to train the model, resulting in efficient and accurate predictions, which are stored in Elasticsearch indices. The inference phase evaluates the test set, storing results for further analysis, while the platform's capabilities allow users with limited machine learning expertise to develop robust models.