Mamba-3B-SlimPJ has emerged as a strong contender to Transformers, with linear scaling in sequence length and fast inference, rivaling some of the best 3B Transformer architectures. The Mamba model was trained on 600B tokens on the SlimPajama dataset, under the Apache 2 license, and matches the quality of some of the best 3B Transformers such as BTLM-3B-8K with 17% fewer FLOPs. Mamba is a promising architecture for building foundation models, particularly in diverse applications like language, genomics, audio, and video. The model's training details include using the same hyperparameters as Mamba-3B on the Pile dataset but with a longer learning rate decay to accommodate more tokens. Evaluations show that Mamba-3B-SlimPJ matches the quality of very strong Transformers with 17% fewer training FLOPs and can be used for experimentation, understanding, chat, and instruction-tuned models.