Hyena Hierarchy proposes a subquadratic replacement for attention in large Transformers, combining long convolutions with data-controlled gating to improve accuracy on sequence tasks. The proposed Hyena operators achieve a 50-point improvement over existing methods and match the performance of attention-based models while reducing training compute by 20%. Additionally, Hyena operators are faster than highly optimized attention at longer sequence lengths, making them suitable for large-scale language modeling tasks. The dataset provided with the article is conceptualized as a foundation for creating high-quality datasets, requiring filtering based on intended application and quality signals.