How Plaid parses transaction data

Post Details

Company

Plaid

Date Published

Oct. 22, 2020

Author

Chris Jin

Word Count

1,311

Language

English

Hacker News Points

-

Source URL

plaid.com/blog/how-plaid-parses-transaction-data

Summary

Plaid uses a combination of natural language processing (NLP) techniques and machine learning algorithms to parse transaction data from financial institutions, enabling the company to standardize and normalize the data for use across multiple accounts. The process involves named entity recognition (NER), which is used to identify merchants and locations in unstructured text, such as transaction descriptions. Plaid has developed a solution using a masked language model and bidirectional long short-term memory (LSTM) to tackle these challenges. The masked language model uses transformer encoders to encode contextual information of the input sequences, while the bidirectional LSTM model leverages the Bidirectional LSTM framework to understand contextual information both forwards and backwards. The combination of NLP techniques and machine learning algorithms has yielded promising results for location and merchant parsing, with an accuracy rate of 95% as of today.