Mamba-3
Blog post from Together AI
Mamba-3 is a newly developed state space model (SSM) that prioritizes inference efficiency, contrasting with Mamba-2's focus on training speed. Key enhancements include a more expressive recurrence formula, complex-valued state tracking, and a multi-input, multi-output (MIMO) variant, which collectively enhance accuracy without increasing decoding latency. Mamba-3 outperforms its predecessor, Mamba-2, as well as other models like Gated DeltaNet and Llama-3.2-1B, particularly in prefill and decode latency at a 1.5B scale. These improvements are inspired by traditional control theory and leverage advanced kernel development using Triton, TileLang, and CuTe DSL to optimize hardware performance. The model's architecture has been updated to align with modern language models, incorporating new components such as RoPE and MIMO projections, enhancing its capabilities without compromising speed. Mamba-3 is particularly effective in language modeling tasks while maintaining competitive performance in retrieval tasks with its fixed-size state. The ongoing shift in focus from training to inference in large language models prompted the development of Mamba-3, aiming to optimize the quality-efficiency frontier by allowing better models to run faster. The open-sourcing of the kernels encourages further exploration and development based on Mamba-3's architecture.