FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

Company

Together AI

Date Published

Nov. 13, 2023

Author

Dan Fu, Hermann Kumbong, Eric Nguyen, Chris Ré

Word count

1804

Language

English

Hacker News points

None

URL

www.together.ai/blog/flashfftconv

Summary

FlashFFTConv is an algorithm for efficiently computing FFT convolutions on GPUs, which speeds up convolutions by up to 7.93x over PyTorch and achieves up to 4.4x speedup end-to-end. It addresses the bottlenecks of traditional FFT algorithms in machine learning hardware, particularly I/O and matrix-matrix multiply operations. The algorithm uses a Monarch decomposition of the FFT, which breaks down the convolution into matrix-matrix multiply operations that can be efficiently computed on tensor cores. This allows for faster convolutions and better scaling with sequence length, making it suitable for long-sequence tasks such as audio analysis and DNA modeling. FlashFFTConv has been integrated into research codebases and is expected to enable new applications in machine learning.