Quantifying PII Exposure in Synthetic Data

Post Details

Company

Gretel.ai

Date Published

Nov. 22, 2024

Author

Alexa Haushalter

Word Count

2,382

Company Posts That Month

7

Language

English

Hacker News Points

-

Source URL

gretel.ai/blog/quantifying-pii-exposure-in-synthetic-data

Summary

Gretel's PII Replay is a new privacy metric that identifies instances of sensitive information found in original training data and counts how often those values appear in synthetic output. This tool works alongside Membership Inference Protection and Attribute Inference Protection, ensuring your synthetic data remains private by design. By leveraging Gretel Transform to identify and classify instances of PII in the original training data, users can now easily see whether any of the original PII is showing up in their synthetic data. Strategies to minimize PII Replay include using Transform before generating synthetics, choosing a model designed to minimize PII replay, leveraging differential privacy, pre-processing to remove unnecessary columns, and using pre- and post-processing strategies strategically.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
AI Model Fine-tuning	9	547	127	59	-39%
LLM	2	2,876	370	130	-20%