Building a âPropensity to Convertâ machine learning model with Snowpark and Snowplow web tracking - Part 2
Blog post from Snowplow
The guide outlines the process of building a propensity-to-convert machine learning model using Snowplow and Snowflake, focusing on steps such as folding, modeling, scoring, and ultimately deploying the model into Snowflake. It highlights the challenges and techniques in handling class imbalance, such as class weighting, under-sampling, and over-sampling with SMOTE, and discusses the importance of feature selection and imputation in improving model performance. The guide also emphasizes the use of SHAP for interpreting feature importance and explains the implementation of a classification pipeline using logistic regression and LightGBM, highlighting the latter's superior performance on categorical data. Additionally, the guide elaborates on the complexities of hyperparameter tuning through GridSearch and HalvingGridSearchCV, and the challenges of deploying models as user-defined functions (UDFs) in Snowflake. Despite achieving modest F2 scores due to data limitations and class imbalance, the guide concludes that engagement metrics significantly enhance model performance, especially in non-linear models.