Run DeepSeek R1Dynamic 1.58-bit

Post Details

Company

Unsloth

Date Published

Jan. 27, 2025

Author

Daniel & Michael

Word Count

2,359

Language

English

Hacker News Points

-

Source URL

unsloth.ai/blog/deepseekr1-dynamic

Summary

DeepSeek-R1, an open-source model rivaling OpenAI's O1 reasoning model, has been optimized for local use through a process called dynamic quantization, reducing the model size from 720GB to 131GB while maintaining functionality. This optimization involves selectively quantizing certain layers to higher bits and leaving most mixture of experts (MoE) layers at 1.5 bits, which helps prevent performance issues like endless loops and incorrect outputs. Users can run the model without a GPU, although it will be slow, with optimal performance requiring a combination of VRAM and RAM totaling at least 80GB. Benchmarks show that the 1.58-bit version achieves a 69.2% score on a Flappy Bird game benchmark, compared to 91.7% for a 2-bit version, while naive quantization approaches lead to poor performance. The DeepSeek-R1 architecture exploits MoE layers to increase parameters without raising computational costs, and specific layers are left at higher precision to maintain accuracy. The model and its components are available on platforms like Hugging Face, and detailed instructions for running and optimizing the model are provided for different hardware configurations.