Home / Companies / Unsloth / Blog / Post Details
Content Deep Dive

Run DeepSeek R1Dynamic 1.58-bit

Blog post from Unsloth

Post Details
Company
Date Published
Author
Daniel & Michael
Word Count
2,359
Language
English
Hacker News Points
-
Summary

DeepSeek-R1, an open-source model rivaling OpenAI's O1 reasoning model, has been optimized for local use through a process called dynamic quantization, reducing the model size from 720GB to 131GB while maintaining functionality. This optimization involves selectively quantizing certain layers to higher bits and leaving most mixture of experts (MoE) layers at 1.5 bits, which helps prevent performance issues like endless loops and incorrect outputs. Users can run the model without a GPU, although it will be slow, with optimal performance requiring a combination of VRAM and RAM totaling at least 80GB. Benchmarks show that the 1.58-bit version achieves a 69.2% score on a Flappy Bird game benchmark, compared to 91.7% for a 2-bit version, while naive quantization approaches lead to poor performance. The DeepSeek-R1 architecture exploits MoE layers to increase parameters without raising computational costs, and specific layers are left at higher precision to maintain accuracy. The model and its components are available on platforms like Hugging Face, and detailed instructions for running and optimizing the model are provided for different hardware configurations.