NVIDIA has introduced Nemotron-PII, a synthetic dataset designed to facilitate the safe training and evaluation of AI models on sensitive data, such as emails, chat logs, and clinical notes. Constructed using the NeMo Data Designer, this dataset includes 100,000 synthetic records, covering over 55 types of Personally Identifiable Information (PII) across more than 50 industries. It is paired with GLiNER-PII, an open-source model optimized for PII and Protected Health Information (PHI) detection, offering robust privacy-preserving solutions for sectors like healthcare, finance, and legal. The dataset and model aim to help organizations comply with regulations such as HIPAA and GDPR by providing a high-quality, scalable foundation for de-identification and redaction workflows. Available under a CC BY 4.0 license, Nemotron-PII allows for both free and commercial use, offering enterprise-grade accuracy without the risk of real PII exposure. The initiative reflects NVIDIA's commitment to advancing trustworthy AI by integrating privacy-focused solutions into data pipelines and encouraging the use of synthetic data to maintain privacy standards.