Mamba‑3B-SlimPJ: State-space models rivaling the best Transformer architecture

Post Details

Company

Cartesia

Date Published

Dec. 14, 2023

Author

Albert Gu

Word Count

560

Language

English

Hacker News Points

-

Source URL

cartesia.ai/blog/mamba-3b-slimpj

Summary

Mamba-3B-SlimPJ, the latest language model developed in partnership with Cartesia and Together, is built on the innovative Mamba architecture and offers a highly efficient alternative to traditional Transformer models. With 2.8 billion parameters and trained on 600 billion tokens, this model achieves comparable performance to leading 3B Transformer models like BTLM-3B-8K but with 17% fewer training FLOPs, thanks to its linear scaling in sequence length and fast inference capabilities. Released under an Apache 2.0 license, Mamba-3B-SlimPJ is trained on the SlimPajama dataset using the GPT-NeoX tokenizer and evaluated across multiple tasks using a combination of zero-shot and five-shot methodologies. The release aims to provide a robust base model for further experimentation in various domains, including language, audio, and video, while fostering open-source collaboration. Cartesia, led by Chief Scientist Albert, continues to explore next-generation architectures like state space models to push the boundaries of AI capabilities, inviting interested individuals to join their efforts.