Hungry Hungry Hippos: Towards language modeling with state space models

Company

Together AI

Date Published

Dec. 28, 2022

Author

Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

Word count

384

Language

English

Hacker News points

None

URL

www.together.ai/blog/hungry-hungry-hippos-towards-language-modeling-with-state-space-models

Summary

Hungry Hungry Hippos: Towards language modeling with state space models``` State space models have demonstrated strong sequence modeling performance, but underperform compared to attention-based models like Transformers in language modeling. This is due to poor hardware utilization and inefficiencies in training SSMs. To address this, researchers propose a new state space model layer called H3 that improves recall of earlier tokens and comparison across the sequence, achieving comparable performance to Transformers on certain tasks. Additionally, they introduce FlashConv, a novel algorithm that improves efficiency by up to 2x speedup on long-range tasks, allowing for larger models to be trained at lower costs. The use of these advancements enables the creation of hybrid language models that outperform Transformers in zero- and few-shot learning scenarios and achieve better perplexity on certain benchmarks.