Company
Date Published
Author
Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher RĂ©
Word count
384
Language
English
Hacker News points
None

Summary

Hungry Hungry Hippos: Towards language modeling with state space models``` State space models have demonstrated strong sequence modeling performance, but underperform compared to attention-based models like Transformers in language modeling. This is due to poor hardware utilization and inefficiencies in training SSMs. To address this, researchers propose a new state space model layer called H3 that improves recall of earlier tokens and comparison across the sequence, achieving comparable performance to Transformers on certain tasks. Additionally, they introduce FlashConv, a novel algorithm that improves efficiency by up to 2x speedup on long-range tasks, allowing for larger models to be trained at lower costs. The use of these advancements enables the creation of hybrid language models that outperform Transformers in zero- and few-shot learning scenarios and achieve better perplexity on certain benchmarks.