Run QwQ-32B effectively + Bug Fixes

Post Details

Company

Unsloth

Date Published

March 7, 2025

Author

Daniel & Michael

Word Count

2,207

Language

English

Hacker News Points

-

Source URL

unsloth.ai/blog/qwq-32b

Summary

Qwen's release of QwQ-32B, a powerful reasoning model comparable to DeepSeek-R1, faced challenges such as infinite loops and repetition errors, which did not reflect its true quality. To help users address these issues, the company provided a detailed guide and tutorial, recommending specific settings for inference, including temperature, top_k, and top_p values. They also identified and resolved issues impacting fine-tuning and provided updates to token settings. The blog suggests that for optimal performance with llama.cpp, users should adjust the ordering of samplers to avoid endless generations. Additionally, dynamic 4-bit quantizations were introduced to improve accuracy, and fine-tuning with Unsloth offers significant VRAM savings and increased speed. Users are encouraged to access additional resources and support through the company's online platforms.