Company
Date Published
Author
Albert Gu
Word count
560
Language
English
Hacker News points
None

Summary

Mamba-3B-SlimPJ, the latest language model developed in partnership with Cartesia and Together, is built on the innovative Mamba architecture and offers a highly efficient alternative to traditional Transformer models. With 2.8 billion parameters and trained on 600 billion tokens, this model achieves comparable performance to leading 3B Transformer models like BTLM-3B-8K but with 17% fewer training FLOPs, thanks to its linear scaling in sequence length and fast inference capabilities. Released under an Apache 2.0 license, Mamba-3B-SlimPJ is trained on the SlimPajama dataset using the GPT-NeoX tokenizer and evaluated across multiple tasks using a combination of zero-shot and five-shot methodologies. The release aims to provide a robust base model for further experimentation in various domains, including language, audio, and video, while fostering open-source collaboration. Cartesia, led by Chief Scientist Albert, continues to explore next-generation architectures like state space models to push the boundaries of AI capabilities, inviting interested individuals to join their efforts.