Company
Date Published
Author
Wen Yao, Jeet Nagda, Akshit Annadi, & Rohan Sriram
Word count
2235
Language
English
Hacker News points
None

Summary

Plaid's Bank Income product uses machine learning models to extract and categorize income data from a consumer's bank transactions. The models work together to identify potential income sources for consumers to choose from, filter out non-income transactions, cluster similar transactions into source streams, detect frequency of income sources, predict probability of a source stream being income, and categorize income sources into 13 categories. The models are designed to be efficient, scalable, and robust, with features such as transaction description embedding, categorical data featurization, time series featurization, and source context featurization. The product has been shown to save users 17% of selection time while keeping the total income shared consistent, and has achieved impressive performance metrics such as an Average Precision (AP) of 0.969 for SALARY and 0.739 for LONG_TERM_DISABILITY. The model is designed to learn and capture new signals over time, with a model retraining pipeline that includes hyperparameter tuning and evaluation.