/plushcap/analysis/cloudflare/machine-learning-mobile-traffic-bots

Evolving our machine learning to stop mobile bots

What's this blog post about?

In this article, the author discusses how Cloudflare has evolved its bot management toolset over time, particularly in response to changes in traffic patterns and increasing mobile app usage. The company started with a static machine learning (ML) detection model that used common bot user agents to identify bad bots. As attackers became more sophisticated, new sets of model features were generated, and the heuristics were able to accurately identify various types of bad bots. The author then delves into how Cloudflare builds and deploys its ML models. Data gathering and preparation involve leveraging the amount and variety of traffic on their network to create training datasets. They identify samples that are clearly bots or not-bots, perform statistical analysis of features, and use ANOVA f-value for feature selection. Model building and evaluation involve using an internal pipeline backed by Airflow and choosing the Catboost library for binary classification model training. Before deploying a new model to customer traffic, Cloudflare performs offline monitoring and uses SHAP Explainer to interpret the model's predictions. They also test models in shadow mode and active mode before officially releasing them as stable. The author highlights how they improved mobile app performance by incorporating validated mobile request datasets into their model training process, resulting in a significant reduction in false positive rates for Android traffic and other edge cases. In conclusion, the article emphasizes the importance of continuous improvement and adaptation to changing patterns of traffic in developing effective bot management tools.

Company
Cloudflare

Date published
March 17, 2022

Author(s)
Arushi Shah, Reid Tatoris

Word count
1690

Hacker News points
None found.

Language
English