Hyena Hierarchy: Towards larger convolutional language models

Company

Date Published

Feb. 2, 2023

Author

Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré

Word count

291

Language

English

Hacker News points

None

URL

www.together.ai/blog/hyena-hierarchy-towards-larger-convolutional-language-models

Summary

Hyena Hierarchy proposes a subquadratic replacement for attention in large Transformers, combining long convolutions with data-controlled gating to improve accuracy on sequence tasks. The proposed Hyena operators achieve a 50-point improvement over existing methods and match the performance of attention-based models while reducing training compute by 20%. Additionally, Hyena operators are faster than highly optimized attention at longer sequence lengths, making them suitable for large-scale language modeling tasks. The dataset provided with the article is conceptualized as a foundation for creating high-quality datasets, requiring filtering based on intended application and quality signals.