SOTA OCR on-device with Core ML and dots.ocr

Post Details

Company

HuggingFace

Date Published

Oct. 2, 2025

Author

Christopher Fleetwood and Pedro Cuenca

Word Count

1,910

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/dots-ocr-ne

Summary

The blog post discusses the integration of dots.ocr, a 3 billion parameter OCR model from RedNote, on Apple's devices using Core ML and MLX. The article highlights the advantages of running models on-device, such as avoiding API key issues, zero cost, and no network dependency, while noting the challenges due to limited compute and power resources. Apple's Neural Engine is emphasized for its power efficiency compared to CPU and GPU, though it is only accessible via the closed-source Core ML framework. The conversion process from PyTorch to Core ML involves capturing the execution graph and compiling it with coremltools, which can be arduous due to various errors and the model's complexity. The vision encoder and LM backbone of dots.ocr are converted using CoreML and MLX, respectively, with adjustments made to simplify and optimize the model for on-device deployment. Despite initial challenges, the model conversion is successful, although the resulting size of over 5GB is impractical for deployment, prompting further optimizations discussed in subsequent articles.