Home / Companies / Unsloth / Blog / Post Details
Content Deep Dive

Long context gpt-oss Fine-tuning

Blog post from Unsloth

Post Details
Company
Date Published
Author
Daniel & Michael
Word Count
1,661
Language
English
Hacker News Points
-
Summary

Unsloth has introduced Flex Attention for OpenAI's gpt-oss training, enabling significantly longer context lengths, reduced VRAM usage, and faster training speeds compared to other implementations, including Flash Attention 3. This advancement allows for training with up to 60K context length on 80GB VRAM using BF16 LoRA, and even longer with QLoRA, while addressing previous issues such as infinite training losses on float16 GPUs. Flex Attention provides customizable attention mechanisms, allowing for efficient training by utilizing attention sinks and sliding window techniques. The new capability to export QLoRA fine-tuned gpt-oss models to platforms like llama.cpp, vLLM, and Hugging Face, along with improved direct fine-tuning support, enhances the flexibility and usability of the models. Additionally, bug fixes have been implemented to ensure consistent training loss behavior across different GPU configurations.