Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Hungry Hungry Hippos: Towards language modeling with state space models

Blog post from Together AI

Post Details
Company
Date Published
Author
Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher RĂ©
Word Count
384
Language
English
Hacker News Points
-
Summary

Hungry Hungry Hippos: Towards language modeling with state space models``` State space models have demonstrated strong sequence modeling performance, but underperform compared to attention-based models like Transformers in language modeling. This is due to poor hardware utilization and inefficiencies in training SSMs. To address this, researchers propose a new state space model layer called H3 that improves recall of earlier tokens and comparison across the sequence, achieving comparable performance to Transformers on certain tasks. Additionally, they introduce FlashConv, a novel algorithm that improves efficiency by up to 2x speedup on long-range tasks, allowing for larger models to be trained at lower costs. The use of these advancements enables the creation of hybrid language models that outperform Transformers in zero- and few-shot learning scenarios and achieve better perplexity on certain benchmarks.