Home / Companies / Arize / Blog / Post Details
Content Deep Dive

Implementing Text PII Anonymization

Blog post from Arize

Post Details
Company
Date Published
Author
Jason Lopatecki
Word Count
442
Language
English
Hacker News Points
-
Summary

Microsoft Presidio is an open-source project aimed at ensuring proper management and governance of sensitive data, including PII (personally identifiable information). It uses mechanisms like entity recognition, regular expressions, rule-based logic, checksum with relevant context in multiple languages, and external PII detection models. The two main components are AnalyzerEngine, which scans text to identify PII, and AnonymizerEngine, which replaces identified PII with anonymized values. Presidio can be used to anonymize conversations in a chatbot system by importing necessary dependencies, initializing the analyzer and anonymizer, creating a function that finds and redacts important PII, and running this function on each row of a pandas dataframe to create a new column with anonymized data.