Company
Date Published
Author
Christopher Fleetwood and Pedro Cuenca
Word count
1910
Language
-
Hacker News points
None

Summary

The blog post discusses the integration of dots.ocr, a 3 billion parameter OCR model from RedNote, on Apple's devices using Core ML and MLX. The article highlights the advantages of running models on-device, such as avoiding API key issues, zero cost, and no network dependency, while noting the challenges due to limited compute and power resources. Apple's Neural Engine is emphasized for its power efficiency compared to CPU and GPU, though it is only accessible via the closed-source Core ML framework. The conversion process from PyTorch to Core ML involves capturing the execution graph and compiling it with coremltools, which can be arduous due to various errors and the model's complexity. The vision encoder and LM backbone of dots.ocr are converted using CoreML and MLX, respectively, with adjustments made to simplify and optimize the model for on-device deployment. Despite initial challenges, the model conversion is successful, although the resulting size of over 5GB is impractical for deployment, prompting further optimizations discussed in subsequent articles.