Building Tucano 2: Open-Source Language Models That Actually Think in Portuguese

Post Details

Company

Hugging Face

Date Published

March 5, 2026

Author

Nicholas Kluge Corrêa, Aniket Sen, Shiza Fatimah, Sophia Falk, and Lucie Flek

Word Count

2,258

Company Posts That Month

63

Language

-

Hacker News Points

-

Post removed?

No

Source URL

huggingface.co/blog/Polygl0t/tucano2

Summary

Tucano 2 is a family of open-source language models specifically designed for Portuguese, addressing the lack of transparency and optimization found in existing multilingual models. Developed with a focus on openness and collaboration, these models range from 0.5 billion to 3.7 billion parameters and outperform prior Portuguese models of similar sizes. The development process involved creating a large, high-quality Portuguese corpus, GigaVerbo-v2, and a custom tokenizer optimized for Portuguese, significantly reducing computational costs. The models were trained using a blend of educational and synthetic data, and evaluated with a new two-tier suite designed to provide reliable benchmarks for Portuguese. The project also emphasizes transparency regarding energy consumption and environmental costs, reporting both carbon emissions and the material footprint associated with GPU usage. All datasets, models, and tools are released under permissive licenses, inviting further research and development in Portuguese natural language processing.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	5	6,078	960	218	+18%
AI Model Fine-tuning	3	906	165	54	-16%
AI Agents	1	4,545	963	231	+27%
RAG	1	1,806	326	91	+5%
Reinforcement learning	1	121	52	29	-1%
Vector Search	1	2,370	415	145	+7%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.