Automatically Reducing AI Bias With Synthetic Data

Company

Gretel.ai

Date Published

Jan. 9, 2021

Author

Amy Steier

Word count

679

Language

English

Hacker News points

URL

gretel.ai/blog/automatically-reducing-ai-bias-with-synthetic-data

Summary

This blog provides a step-by-step guide on using Gretel SDKs to create a fair, balanced, privacy preserving version of the 1994 US Census dataset. The process involves balancing underrepresented classes such as race, gender, and income bracket in the dataset. The Python notebook used for this purpose can be utilized for any imbalanced dataset. Gretel's SDK allows users to choose from two modes: "full" mode generates a complete synthetic dataset with representation bias removed, while "additive" mode only generates synthetic samples that remove bias when added to the original set. The blueprint also enables users to view existing categorical field distributions in the dataset and generate synthetic data for specific fields. Finally, users can save their new synthetic data or back onto a Gretel Project.