Get your VLM running in 3 simple steps on Intel CPUs

Company

HuggingFace

Date Published

Oct. 15, 2025

Author

Ezequiel Lanza, Helena, Nikita, Ella Charlaix, and Ilyas Moutawwakil

Word count

1479

Language

Hacker News points

None

URL

huggingface.co/blog/openvino-vlm

Summary

Vision Language Models (VLMs) are advanced AI models that analyze visual content and offer enhanced privacy and speed when run locally on devices, thanks to tools like Optimum Intel and OpenVINO. This blog post provides a three-step guide on deploying VLMs, specifically the SmolVLM model, on Intel CPUs without requiring expensive hardware. The process involves converting the model to OpenVINO's Intermediate Representation (IR), applying quantization techniques to optimize performance, and running inference to evaluate the model's efficiency. Quantization reduces model size and memory usage by lowering precision, although it may slightly impact accuracy. The post highlights benchmark results showing significant performance improvements in latency and throughput when running the model on Intel CPUs, particularly when using OpenVINO and 8-bit weight-only quantization. The guide demonstrates that optimized VLMs, like SmolVLM2-256M, can achieve faster processing speeds and higher throughput, making them practical for devices with limited resources.