Company
Date Published
Author
Dan Fu and Tri Dao
Word count
1100
Language
English
Hacker News points
None

Summary

FlashConv is a technique for speeding up state space models (SSMs) in deep learning, which can run faster than optimized implementations of attention out of the box. SSMs are a promising alternative to attention, scaling nearly-linearly with sequence length instead of quadratic. However, they often run slower due to low FLOP utilization on GPU. FlashConv uses Fast Fourier Transforms (FFTs) and fused FFT convolution to speed up convolutions, achieving speeds comparable to or better than attention at long sequence lengths. The technique also takes advantage of tensor cores on GPUs to further improve performance. By applying these optimizations, SSMs can be used for large-scale language models, enabling faster training and inference times.