Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas

Blog post from Together AI

Post Details
Company
Date Published
Author
Austin Silveria, Soham Govande, Dan Fu
Word Count
1,550
Language
English
Hacker News Points
-
Summary

Chipmunk, a novel training-free method, accelerates diffusion transformers with hardware-aware dynamic column-sparse deltas. By caching attention weights and MLP activations from previous steps, Chipmunk dynamically computes a sparse "delta" against the cached weights. This approach achieves significant speedups in video generation and image generations on various datasets, including up to 3.7x faster video generation at 720x1280 resolution for a 5s video. The method exploits the slow-changing nature of diffusion transformer activations and their inherent sparsity to reduce compute costs. Chipmunk also leverages hardware-efficient sparsity patterns, optimized kernels, and fast cache writeback mechanisms to achieve its performance gains. The technique is designed to be open-sourced and integrated with various model architectures for further acceleration.