Home / Companies / Unsloth / Blog / Post Details
Content Deep Dive

Vision Reinforcement Learning

Blog post from Unsloth

Post Details
Company
Date Published
Author
Daniel & Michael
Word Count
918
Language
English
Hacker News Points
-
Summary

Unsloth introduces new advancements in vision/multimodal reinforcement learning (RL) with the integration of Gemma 3 and Qwen2.5-VL, significantly enhancing speed and efficiency by reducing VRAM usage by 90% and increasing context lengths by 10 times without loss of accuracy. The update incorporates the GSPO algorithm and allows training on a free Colab T4 GPU, with additional support for vLLM VLM integration and a new Standby feature that improves RL training speed by up to 10% and increases context lengths without additional memory usage. Unsloth also supports direct fine-tuning of gpt-oss models, resolving previous inference issues with collaborative efforts from Hugging Face and OpenAI, while aligning training loss behavior across different GPU setups. The platform's enhancements make RL training faster and more memory-efficient, with benchmarks showing significant improvements in context lengths for models like Qwen3-32B and Llama-3.1-8B.