Unsloth Gradient Checkpointing - 4x longer context windows
Blog post from Unsloth
Unsloth Gradient Checkpointing is a new algorithm designed to enhance the fine-tuning of large language models (LLMs) by enabling exceptionally long context windows, achieving up to 228,199 tokens on NVIDIA H100 80GB GPUs and 56,420 tokens on RTX 4090 24GB GPUs. This innovative approach reduces memory usage by 30% with a minimal time overhead, making it significantly more efficient for fine-tuning LLMs with long contexts compared to previous methods. The algorithm is compatible with all model architectures utilizing gradient checkpointing, and it allows for a substantial increase in batch size. Additionally, improvements in tokenizer efficiency and RoPE Embeddings speed contribute to enhanced performance, while the introduction of pre-quantized models further reduces download time and VRAM usage. The developers also highlight upcoming features such as an automatic model optimizer and a Colab-enhanced one-click fine-tuning system, and they encourage community engagement and support through platforms like Discord and Ko-fi.