🛡️ Nemotron PII: Synthesized Data for Privacy-Preserving AI

Post Details

Company

HuggingFace

Date Published

Oct. 28, 2025

Author

Maarten Van Segbroeck

Word Count

988

Company Posts That Month

41

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/nvidia/nemotron-pii

Summary

NVIDIA has introduced Nemotron-PII, a synthetic dataset designed to facilitate the safe training and evaluation of AI models on sensitive data, such as emails, chat logs, and clinical notes. Constructed using the NeMo Data Designer, this dataset includes 100,000 synthetic records, covering over 55 types of Personally Identifiable Information (PII) across more than 50 industries. It is paired with GLiNER-PII, an open-source model optimized for PII and Protected Health Information (PHI) detection, offering robust privacy-preserving solutions for sectors like healthcare, finance, and legal. The dataset and model aim to help organizations comply with regulations such as HIPAA and GDPR by providing a high-quality, scalable foundation for de-identification and redaction workflows. Available under a CC BY 4.0 license, Nemotron-PII allows for both free and commercial use, offering enterprise-grade accuracy without the risk of real PII exposure. The initiative reflects NVIDIA's commitment to advancing trustworthy AI by integrating privacy-focused solutions into data pipelines and encouraging the use of synthetic data to maintain privacy standards.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
AI Model Fine-tuning	3	762	158	56	+176%