Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Hyena Hierarchy: Towards larger convolutional language models

Blog post from Together AI

Post Details
Company
Date Published
Author
Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher RĂ©
Word Count
291
Language
English
Hacker News Points
-
Summary

Hyena Hierarchy proposes a subquadratic replacement for attention in large Transformers, combining long convolutions with data-controlled gating to improve accuracy on sequence tasks. The proposed Hyena operators achieve a 50-point improvement over existing methods and match the performance of attention-based models while reducing training compute by 20%. Additionally, Hyena operators are faster than highly optimized attention at longer sequence lengths, making them suitable for large-scale language modeling tasks. The dataset provided with the article is conceptualized as a foundation for creating high-quality datasets, requiring filtering based on intended application and quality signals.