How To Create Differentially Private Synthetic Data

Company

Gretel.ai

Date Published

Jan. 9, 2021

Author

Alex Watson

Word count

1073

Language

English

Hacker News points

URL

gretel.ai/blog/how-to-create-differentially-private-synthetic-data

Summary

This post provides a practical guide to creating differentially private synthetic data using Python and TensorFlow. It demonstrates how to train a synthetic data model on the Netflix Prize dataset while protecting user identities through differential privacy techniques. The goal is to generate new data in the same format as the source data, with increased privacy guarantees and retaining statistical insights. The post discusses parameter tuning approaches for finding optimal privacy parameters and presents experiments using the gretel-synthetics library and TensorFlow-Privacy. It also explores optimizing learning rates, l2_norm_clip, and noise_multiplier to improve model accuracy while maintaining privacy guarantees. The final section encourages readers to experiment with generating synthetic datasets on their own data using the provided Jupyter notebook.