FlashConv: Speeding up state space models

Company

Together AI

Date Published

Jan. 23, 2023

Author

Dan Fu and Tri Dao

Word count

1100

Language

English

Hacker News points

None

URL

www.together.ai/blog/h3

Summary

FlashConv is a technique for speeding up state space models (SSMs) in deep learning, which can run faster than optimized implementations of attention out of the box. SSMs are a promising alternative to attention, scaling nearly-linearly with sequence length instead of quadratic. However, they often run slower due to low FLOP utilization on GPU. FlashConv uses Fast Fourier Transforms (FFTs) and fused FFT convolution to speed up convolutions, achieving speeds comparable to or better than attention at long sequence lengths. The technique also takes advantage of tensor cores on GPUs to further improve performance. By applying these optimizations, SSMs can be used for large-scale language models, enabling faster training and inference times.