Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas

Company

Together AI

Date Published

April 21, 2025

Author

Austin Silveria, Soham Govande, Dan Fu

Word count

1550

Language

English

Hacker News points

None

URL

www.together.ai/blog/chipmunk

Summary

Chipmunk, a novel training-free method, accelerates diffusion transformers with hardware-aware dynamic column-sparse deltas. By caching attention weights and MLP activations from previous steps, Chipmunk dynamically computes a sparse "delta" against the cached weights. This approach achieves significant speedups in video generation and image generations on various datasets, including up to 3.7x faster video generation at 720x1280 resolution for a 5s video. The method exploits the slow-changing nature of diffusion transformer activations and their inherent sparsity to reduce compute costs. Chipmunk also leverages hardware-efficient sparsity patterns, optimized kernels, and fast cache writeback mechanisms to achieve its performance gains. The technique is designed to be open-sourced and integrated with various model architectures for further acceleration.