Nemotron-Personas-Brazil: Co-Designed Data for Sovereign AI
Blog post from HuggingFace
Nemotron-Personas-Brazil is an open dataset designed to enhance AI systems in Brazil by providing 6 million synthetic personas that reflect the country's diverse population, using data from the Brazilian Institute of Geography and Statistics (IBGE). Developed by NVIDIA in collaboration with WideLabs, this dataset is tailored for Brazilian developers and researchers to build culturally informed AI applications, addressing the limitations of English-centric training data. Each persona is designed using NVIDIA's NeMo Data Designer and includes attributes such as age, sex, education, occupation, and location, all written in natural Brazilian Portuguese. The dataset maintains privacy by being fully synthetic and is available under the CC BY 4.0 license, aiming to democratize access to culturally authentic AI training data and support sovereign AI development in Brazil.