Vision Reinforcement Learning

Post Details

Company

Unsloth

Date Published

Aug. 28, 2025

Author

Daniel & Michael

Word Count

918

Language

English

Hacker News Points

-

Source URL

unsloth.ai/blog/vision-rl

Summary

Unsloth introduces new advancements in vision/multimodal reinforcement learning (RL) with the integration of Gemma 3 and Qwen2.5-VL, significantly enhancing speed and efficiency by reducing VRAM usage by 90% and increasing context lengths by 10 times without loss of accuracy. The update incorporates the GSPO algorithm and allows training on a free Colab T4 GPU, with additional support for vLLM VLM integration and a new Standby feature that improves RL training speed by up to 10% and increases context lengths without additional memory usage. Unsloth also supports direct fine-tuning of gpt-oss models, resolving previous inference issues with collaborative efforts from Hugging Face and OpenAI, while aligning training loss behavior across different GPU setups. The platform's enhancements make RL training faster and more memory-efficient, with benchmarks showing significant improvements in context lengths for models like Qwen3-32B and Llama-3.1-8B.