Building a âPropensity to Convertâ machine learning model with Snowpark and Snowplow web tracking - Part 2

Post Details

Company

Snowplow

Date Published

June 24, 2022

Author

Pavel Voropaev

Word Count

4,427

Language

English

Hacker News Points

-

Source URL

snowplow.io/blog/building-a-propensity-to-convert-machine-learning-model-with-snowpark-and-snowplow-web-tracking-part-2

Summary

The guide outlines the process of building a propensity-to-convert machine learning model using Snowplow and Snowflake, focusing on steps such as folding, modeling, scoring, and ultimately deploying the model into Snowflake. It highlights the challenges and techniques in handling class imbalance, such as class weighting, under-sampling, and over-sampling with SMOTE, and discusses the importance of feature selection and imputation in improving model performance. The guide also emphasizes the use of SHAP for interpreting feature importance and explains the implementation of a classification pipeline using logistic regression and LightGBM, highlighting the latter's superior performance on categorical data. Additionally, the guide elaborates on the complexities of hyperparameter tuning through GridSearch and HalvingGridSearchCV, and the challenges of deploying models as user-defined functions (UDFs) in Snowflake. Despite achieving modest F2 scores due to data limitations and class imbalance, the guide concludes that engagement metrics significantly enhance model performance, especially in non-linear models.

Building a âPropensity to Convertâ machine learning model with Snowpark and Snowplow web tracking - Part 2

Building a âPropensity to Convertâ machine learning model with Snowpark and Snowplow web tracking - Part 2