Fine-tune & Run Gemma 3n

Post Details

Company

Unsloth

Date Published

July 1, 2025

Author

Daniel & Michael

Word Count

1,368

Language

English

Hacker News Points

-

Source URL

unsloth.ai/blog/gemma-3n

Summary

Gemma 3n, Google's new multimodal models supporting text, vision, and audio, is available in 2B and 4B sizes with a 32K context window and multilingual support, and is now supported on the Unsloth framework, which uniquely allows inference and training on f16 GPUs. The models face challenges such as NaNs and infinities on FP16 GPUs, which were mitigated by upcasting certain operations to float32, although this increases VRAM usage. Unsloth introduced autocasting to handle this efficiently while addressing Gemma 3n's unique architecture that reuses hidden states, limiting gradient checkpointing but allowing other compiler optimizations. The MatFormer architecture of Gemma 3n allows for flexibility by nesting progressively smaller transformer layers, enabling the creation of smaller sub-networks for various needs without additional training. Despite initial large losses during fine-tuning, these decrease over time, and Gemma 3n's performance is enhanced by using dynamic 4-bit quants for superior accuracy. The community is encouraged to engage with Unsloth through various platforms, reflecting ongoing collaboration and support with the Gemma team.