Chipmunk, a novel training-free method, accelerates diffusion transformers with hardware-aware dynamic column-sparse deltas. By caching attention weights and MLP activations from previous steps, Chipmunk dynamically computes a sparse "delta" against the cached weights. This approach achieves significant speedups in video generation and image generations on various datasets, including up to 3.7x faster video generation at 720x1280 resolution for a 5s video. The method exploits the slow-changing nature of diffusion transformer activations and their inherent sparsity to reduce compute costs. Chipmunk also leverages hardware-efficient sparsity patterns, optimized kernels, and fast cache writeback mechanisms to achieve its performance gains. The technique is designed to be open-sourced and integrated with various model architectures for further acceleration.