Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Nemotron-Personas-Brazil: Co-Designed Data for Sovereign AI

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Andre Manoel, Yev Meyer, Shyamala Prayaga, Will Jennings, and bardiya sadeghi
Word Count
903
Language
-
Hacker News Points
-
Summary

Nemotron-Personas-Brazil is an open dataset designed to enhance AI systems in Brazil by providing 6 million synthetic personas that reflect the country's diverse population, using data from the Brazilian Institute of Geography and Statistics (IBGE). Developed by NVIDIA in collaboration with WideLabs, this dataset is tailored for Brazilian developers and researchers to build culturally informed AI applications, addressing the limitations of English-centric training data. Each persona is designed using NVIDIA's NeMo Data Designer and includes attributes such as age, sex, education, occupation, and location, all written in natural Brazilian Portuguese. The dataset maintains privacy by being fully synthetic and is available under the CC BY 4.0 license, aiming to democratize access to culturally authentic AI training data and support sovereign AI development in Brazil.