Hungry Hungry Hippos: Towards language modeling with state space models```
State space models have demonstrated strong sequence modeling performance, but underperform compared to attention-based models like Transformers in language modeling. This is due to poor hardware utilization and inefficiencies in training SSMs. To address this, researchers propose a new state space model layer called H3 that improves recall of earlier tokens and comparison across the sequence, achieving comparable performance to Transformers on certain tasks. Additionally, they introduce FlashConv, a novel algorithm that improves efficiency by up to 2x speedup on long-range tasks, allowing for larger models to be trained at lower costs. The use of these advancements enables the creation of hybrid language models that outperform Transformers in zero- and few-shot learning scenarios and achieve better perplexity on certain benchmarks.