Fine-tune Llama 3.3 with Unsloth

Post Details

Company

Unsloth

Date Published

Dec. 10, 2024

Author

Daniel & Michael

Word Count

1,004

Language

English

Hacker News Points

-

Source URL

unsloth.ai/blog/llama3-3

Summary

Unsloth has developed an advanced method for fine-tuning Meta's Llama 3.3 (70B) model that significantly enhances its performance and efficiency, achieving 2x faster fine-tuning and using 70% less memory compared to traditional methods like Flash Attention 2 (FA2) and Hugging Face. Through collaboration with Apple, Unsloth incorporates the new Cut Cross Entropy (CCE) algorithm and its own Gradient Checkpointing algorithm, allowing it to support context lengths up to 89,000, which is 13 times longer than HF+FA2's capability. This innovation also extends to older GPU models, providing a versatile solution for various computational environments. Unsloth's approach includes uploading pre-quantized 4bit models for faster downloads, and it leverages a memory-efficient cross entropy kernel to further expand context length support, achieving a 12-13x increase compared to traditional methods. This development underscores Unsloth's commitment to optimizing AI model fine-tuning, demonstrated by their extensive testing and ongoing collaboration with industry leaders like Apple to enhance memory efficiency and performance.