Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

SANA-WM Bidirectional on Apple Silicon

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Arjun Reddy
Word Count
1,105
Language
-
Hacker News Points
-
Summary

SANA-WM is a 2.6 billion parameter world model designed for image-to-video generation, initially developed for CUDA-based systems and later adapted to run on Apple Silicon, specifically on an M3 Max MacBook Pro with 128 GB unified memory. The adaptation required bypassing CUDA dependencies and implementing a memory-efficient, staged execution process due to Apple Silicon's shared memory constraints. This involved splitting the process into distinct stages—loading and unloading components sequentially—thus preventing simultaneous high memory usage. The runtime utilizes PyTorch MPS and Metal, avoiding CUDA-specific resources, and focuses on generating controlled video rollouts rather than a real-time interactive experience. While the port successfully enables local video generation on Apple hardware, future developments aim to create a more interactive, game-like experience by enhancing responsiveness and incorporating continuous state management and low latency updates. The runtime and patch set are publicly available, allowing others to replicate and further develop the implementation.