Fine-tune Llama 3.3 with Unsloth
Blog post from Unsloth
Unsloth has developed an advanced method for fine-tuning Meta's Llama 3.3 (70B) model that significantly enhances its performance and efficiency, achieving 2x faster fine-tuning and using 70% less memory compared to traditional methods like Flash Attention 2 (FA2) and Hugging Face. Through collaboration with Apple, Unsloth incorporates the new Cut Cross Entropy (CCE) algorithm and its own Gradient Checkpointing algorithm, allowing it to support context lengths up to 89,000, which is 13 times longer than HF+FA2's capability. This innovation also extends to older GPU models, providing a versatile solution for various computational environments. Unsloth's approach includes uploading pre-quantized 4bit models for faster downloads, and it leverages a memory-efficient cross entropy kernel to further expand context length support, achieving a 12-13x increase compared to traditional methods. This development underscores Unsloth's commitment to optimizing AI model fine-tuning, demonstrated by their extensive testing and ongoing collaboration with industry leaders like Apple to enhance memory efficiency and performance.