Improved performance and model support with GGUF
Blog post from Ollama
Ollama 0.30 introduces enhanced performance and broadened compatibility for machine learning models on various hardware platforms, thanks to contributions from NVIDIA and llama.cpp. This update enables faster processing on NVIDIA GPUs, achieving up to a 20% performance boost, and expands GPU acceleration to AMD and Intel devices via Vulkan, eliminating the need for additional vendor-specific libraries. The release also enhances compatibility with the GGUF model ecosystem, allowing users to run diverse models, including LFM, Prism, and those from Unsloth, directly from platforms like Hugging Face. Users can easily integrate these models into coding agents and personal assistants, with tool-calling capabilities verified through the Ollama show command. Ollama acknowledges the collaborative efforts of Georgi Gerganov, llama.cpp, and hardware partners such as NVIDIA, AMD, Qualcomm, and Intel in optimizing performance across platforms.