Ollama is now powered by MLX on Apple Silicon in preview
Blog post from Ollama
Ollama has released a preview of its software optimized for Apple silicon, which utilizes Apple's machine learning framework, MLX, to boost performance significantly, particularly on the M5, M5 Pro, and M5 Max chips. This enhancement is achieved by leveraging GPU Neural Accelerators, which improve both the time to first token and generation speed for tasks involving coding agents like Pi and Claude Code. Ollama 0.19 introduces support for NVIDIA's NVFP4 format, which maintains model accuracy while reducing memory bandwidth and storage needs, ensuring production parity with inference providers. The update also includes improved caching, which enhances responsiveness by reusing cache across conversations and employing intelligent checkpoints. This release is tailored to accelerate the Qwen3.5-35B-A3B model, with a focus on coding tasks, and is part of ongoing efforts to support future models and architectures.