/plushcap/analysis/gretel-ai/reducing-ai-bias-with-synthetic-data

Reducing AI bias with Synthetic data

What's this blog post about?

This post explores using synthetic data to balance a biased health dataset on Kaggle and improve overall model accuracy. The Heart Disease dataset published by the University of California Irvine is imbalanced, with male patient records accounting for 68% of the overall dataset and female patient records at only 32%. To reduce bias in the input data, synthetic female patient records were generated using Gretel.ai's open-source synthetic data library. The resulting augmented dataset was then run through ML algorithms on Kaggle to compare results against the original training set. In five out of six classification algorithms, accuracy increased when trained with the augmented dataset, achieving 96.7% overall accuracy for KNN (up from 88.5%) and 13% gains for the Decision Tree classifier.

Company
Gretel.ai

Date published
Jan. 11, 2021

Author(s)
Alex Watson

Word count
870

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.