Llama 3.2 Vision fine-tuning
Blog post from Unsloth
Unsloth has enhanced its support for vision and multimodal models, notably including Meta's Llama 3.2 models, allowing for faster and more memory-efficient fine-tuning compared to existing solutions like Flash Attention 2 and Hugging Face. The platform has made available Google Colab notebooks for various use cases, such as radiography analysis, handwriting conversion to LaTeX, and general question-answering, demonstrating the versatility of its fine-tuning capabilities. Additionally, Unsloth has addressed several bugs and optimized memory usage, enabling models like Pixtral to fit within a 16GB GPU. New models, including Qwen 2.5 and its variants, are now supported and feature extended context lengths through YaRN technology. Users are encouraged to follow Unsloth on platforms like Hugging Face for updates and to join community channels for support and engagement.