Home / Companies / Unsloth / Blog / Post Details
Content Deep Dive

Fine-tune & Run Gemma 3n

Blog post from Unsloth

Post Details
Company
Date Published
Author
Daniel & Michael
Word Count
1,368
Language
English
Hacker News Points
-
Summary

Gemma 3n, Google's new multimodal models supporting text, vision, and audio, is available in 2B and 4B sizes with a 32K context window and multilingual support, and is now supported on the Unsloth framework, which uniquely allows inference and training on f16 GPUs. The models face challenges such as NaNs and infinities on FP16 GPUs, which were mitigated by upcasting certain operations to float32, although this increases VRAM usage. Unsloth introduced autocasting to handle this efficiently while addressing Gemma 3n's unique architecture that reuses hidden states, limiting gradient checkpointing but allowing other compiler optimizations. The MatFormer architecture of Gemma 3n allows for flexibility by nesting progressively smaller transformer layers, enabling the creation of smaller sub-networks for various needs without additional training. Despite initial large losses during fine-tuning, these decrease over time, and Gemma 3n's performance is enhanced by using dynamic 4-bit quants for superior accuracy. The community is encouraged to engage with Unsloth through various platforms, reflecting ongoing collaboration and support with the Gemma team.