Company
Date Published
Author
Dan Fu, Hermann Kumbong, Eric Nguyen, Chris RĂ©
Word count
1804
Language
English
Hacker News points
None

Summary

FlashFFTConv is an algorithm for efficiently computing FFT convolutions on GPUs, which speeds up convolutions by up to 7.93x over PyTorch and achieves up to 4.4x speedup end-to-end. It addresses the bottlenecks of traditional FFT algorithms in machine learning hardware, particularly I/O and matrix-matrix multiply operations. The algorithm uses a Monarch decomposition of the FFT, which breaks down the convolution into matrix-matrix multiply operations that can be efficiently computed on tensor cores. This allows for faster convolutions and better scaling with sequence length, making it suitable for long-sequence tasks such as audio analysis and DNA modeling. FlashFFTConv has been integrated into research codebases and is expected to enable new applications in machine learning.