Home / Companies / Ollama / Blog / Post Details
Content Deep Dive

Ollama is now powered by MLX on Apple Silicon in preview

Blog post from Ollama

Post Details
Company
Date Published
Author
-
Word Count
527
Language
-
Hacker News Points
-
Source URL
Summary

Ollama has released a preview of its software optimized for Apple silicon, which utilizes Apple's machine learning framework, MLX, to boost performance significantly, particularly on the M5, M5 Pro, and M5 Max chips. This enhancement is achieved by leveraging GPU Neural Accelerators, which improve both the time to first token and generation speed for tasks involving coding agents like Pi and Claude Code. Ollama 0.19 introduces support for NVIDIA's NVFP4 format, which maintains model accuracy while reducing memory bandwidth and storage needs, ensuring production parity with inference providers. The update also includes improved caching, which enhances responsiveness by reusing cache across conversations and employing intelligent checkpoints. This release is tailored to accelerate the Qwen3.5-35B-A3B model, with a focus on coding tasks, and is part of ongoing efforts to support future models and architectures.