NVIDIA Nemotron 3 Super on DeepInfra: 120B MoE Model
Blog post from Deepinfra
DeepInfra's Nemotron 3 Super is a cutting-edge model developed by NVIDIA, featuring a 120 billion parameter architecture that combines Mamba-2, Mixture-of-Experts routing, and attention layers under a novel LatentMoE framework, activating only 12 billion parameters per token for efficiency. The model's prowess is demonstrated by its impressive performance on the RULER benchmark, especially at long context lengths of up to 1 million tokens, surpassing competitors like GPT-OSS-120B. Pre-trained on 25 trillion tokens across diverse domains, Nemotron 3 Super is designed for both deep reasoning and conversational tasks, with a configurable reasoning mode that can be toggled as needed. It offers significant advantages in multi-agent pipelines and complex workflows due to its efficient compute budgeting and agentic scaffolding capabilities. Available on DeepInfra's platform, it supports various API integrations and is priced on a usage-based model, making it accessible for scalable deployment in diverse applications.