Run DeepSeek-V3.1 Dynamic 1-bit GGUFs
Blog post from Unsloth
DeepSeek-V3.1 is an updated hybrid reasoning model by DeepSeek, designed to rival major AI models like OpenAI's GPT-4.5 and Google's Gemini 2.5 Pro, offering significant size reduction from 720GB to 170GB through selective quantization. This model, which requires 715GB for its full 671B parameter version, can be run effectively using Unsloth's 1-bit Dynamic 2.0 GGUFs on popular inference frameworks like llama.cpp and Ollama. Recommended settings for optimal performance include setting the temperature to 0.6 to minimize repetition, using a top_p of 0.95, and a context length of 128K. The update also addresses chat template issues to improve compatibility with various engines and provides detailed instructions for running the model locally or via platforms like Hugging Face. Users are encouraged to leverage community resources and platforms for support and updates.