Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

Blog post from Together AI

Post Details
Company
Date Published
Author
Dan Fu, Hermann Kumbong, Eric Nguyen, Chris RĂ©
Word Count
1,804
Language
English
Hacker News Points
-
Summary

FlashFFTConv is an algorithm for efficiently computing FFT convolutions on GPUs, which speeds up convolutions by up to 7.93x over PyTorch and achieves up to 4.4x speedup end-to-end. It addresses the bottlenecks of traditional FFT algorithms in machine learning hardware, particularly I/O and matrix-matrix multiply operations. The algorithm uses a Monarch decomposition of the FFT, which breaks down the convolution into matrix-matrix multiply operations that can be efficiently computed on tensor cores. This allows for faster convolutions and better scaling with sequence length, making it suitable for long-sequence tasks such as audio analysis and DNA modeling. FlashFFTConv has been integrated into research codebases and is expected to enable new applications in machine learning.