Fine-tune & Run Llama 4
Blog post from Unsloth
Meta's Llama 4 models, including Llama 4 Scout and Llama 4 Maverick, can now be fine-tuned and run using the Unsloth framework, which uniquely supports QLoRA 4-bit training. Unsloth enhances the efficiency of Llama 4 by making fine-tuning 1.5 times faster, reducing VRAM usage by 50%, and allowing for context lengths that are eight times longer compared to environments using Flash Attention 2. The Llama 4 models, available in various dynamic versions on Hugging Face, are optimized for different VRAM configurations, with the 4-bit and 8-bit models specifically requiring Unsloth for training and inference. Unsloth supports a wide range of transformer-style models and training algorithms, offering significant VRAM savings and improved performance benchmarks, particularly for Llama 4 Scout, which has been tested on an 80GB A100 GPU with a focus on QLoRA fine-tuning.