FlashFFTConv is an algorithm for efficiently computing FFT convolutions on GPUs, which speeds up convolutions by up to 7.93x over PyTorch and achieves up to 4.4x speedup end-to-end. It addresses the bottlenecks of traditional FFT algorithms in machine learning hardware, particularly I/O and matrix-matrix multiply operations. The algorithm uses a Monarch decomposition of the FFT, which breaks down the convolution into matrix-matrix multiply operations that can be efficiently computed on tensor cores. This allows for faster convolutions and better scaling with sequence length, making it suitable for long-sequence tasks such as audio analysis and DNA modeling. FlashFFTConv has been integrated into research codebases and is expected to enable new applications in machine learning.