Home / Companies / Unsloth / Blog / Post Details
Content Deep Dive

Fine-tune Llama 3.3 with Unsloth

Blog post from Unsloth

Post Details
Company
Date Published
Author
Daniel & Michael
Word Count
1,004
Language
English
Hacker News Points
-
Summary

Unsloth has developed an advanced method for fine-tuning Meta's Llama 3.3 (70B) model that significantly enhances its performance and efficiency, achieving 2x faster fine-tuning and using 70% less memory compared to traditional methods like Flash Attention 2 (FA2) and Hugging Face. Through collaboration with Apple, Unsloth incorporates the new Cut Cross Entropy (CCE) algorithm and its own Gradient Checkpointing algorithm, allowing it to support context lengths up to 89,000, which is 13 times longer than HF+FA2's capability. This innovation also extends to older GPU models, providing a versatile solution for various computational environments. Unsloth's approach includes uploading pre-quantized 4bit models for faster downloads, and it leverages a memory-efficient cross entropy kernel to further expand context length support, achieving a 12-13x increase compared to traditional methods. This development underscores Unsloth's commitment to optimizing AI model fine-tuning, demonstrated by their extensive testing and ongoing collaboration with industry leaders like Apple to enhance memory efficiency and performance.