Home / Companies / Unsloth / Blog / Post Details
Content Deep Dive

Finetune Llama 3 - 2x faster + 6x longer context + 68% less VRAM

Blog post from Unsloth

Post Details
Company
Date Published
Author
Daniel & Michael
Word Count
891
Language
English
Hacker News Points
-
Summary

Unsloth has introduced updates to the Llama 3 model, significantly enhancing its performance and efficiency in fine-tuning processes. The Llama 3 (8B) model can now be fine-tuned twice as fast while using 63% less memory compared to previous solutions like Flash Attention 2 (FA2) and Hugging Face, and Llama 3 (70B) achieves 1.8 times faster processing with a 68% reduction in VRAM. Notably, these models now support much longer context lengths, with the Llama-3 70B model fitting 48,000 tokens compared to 7,000 without Unsloth. They have made a Colab notebook available for finetuning the 8B model using a free Tesla T4 GPU, and pre-quantized models are available for faster downloads. Additionally, the update addresses quirks such as the tokenizer not adding the BOS token and dealing with untrained tokens in the base model to prevent issues during finetuning. The community is encouraged to support and engage with Unsloth through various platforms, including Discord and Twitter, while also looking forward to future updates like Phi 3 support.