Content Deep Dive
How we accidentally discovered personal data in a popular Kaggle dataset
Blog post from Gretel.ai
Post Details
Company
Date Published
Author
John Myers
Word Count
923
Language
English
Hacker News Points
1
Summary
The upcoming features in Gretel Public Beta include automatic data labeling using Natural Language Processing (NLP) and neural network-based entity recognition for names and addresses, managed regular expressions, and custom extractors. These features enable the discovery of personally identifiable information (PII) such as full names and email addresses in datasets like Lending Club's financial dataset on Kaggle. Gretel helps developers share data more safely by providing workflows to understand and make informed decisions about data safety.