Company
Date Published
Author
Together
Word count
1712
Language
English
Hacker News points
221

Summary

The StripedHyena models, including StripedHyena-Hessian-7B (SH 7B) and StripedHyena-Nous-7B (SH-N 7B), are an alternative to the popular Transformer architecture, offering improved efficiency in training, inference, and memory usage. The models build on research on designing efficient sequence modeling architectures, including H3, Hyena, HyenaDNA, and Monarch Mixer. StripedHyena achieves comparable performance with state-of-the-art Transformers on short and long-context evaluations, is faster and more memory-efficient for long-sequence training, fine-tuning, and generation, and has a reduced memory footprint during autoregressive generation. The models are designed using a hybrid of attention and gated convolutions arranged in Hyena operators and are optimized using model grafting techniques. StripedHyena is the first alternative architecture competitive with strong Transformer base models of the same size or larger, at scale, and can be used as a generalist baseline for long-context tasks. The models are designed to push the boundaries of model architectures beyond Transformers and inspire the open-source community to explore new exciting builds with diverse architectures.