Monarch Mixer: A new model architecture for increased efficiency

Company

Together AI

Date Published

July 25, 2023

Author

Dan Fu, Simran Arora, Chris Ré

Word count

1981

Language

English

Hacker News points

None

URL

www.together.ai/blog/monarch-mixer

Summary

The researchers at Together AI have developed a new model architecture called Monarch Mixer, which aims to increase efficiency while maintaining quality in Transformers. The Monarch Mixer (M2) is a sub-quadratic approach that replaces the traditional Transformer architecture with a more efficient one, enabling it to scale more efficiently and train faster. The first target for M2 is BERT, the most popular model used for language tasks, and M2-BERT has been shown to be 25% more parameter-efficient than BERT while matching its quality. The researchers have also explored the potential of long-sequence models with Monarch Mixer, which could enable scaling to longer sequences without significant loss in performance. The code and checkpoints for M2-BERT are now available on GitHub, and further releases and updates will be made in the coming weeks.