NVIDIA Nemotron 3 Super on DeepInfra: 120B MoE Model

Post Details

Company

Deepinfra

Date Published

May 25, 2026

Author

Deep

Word Count

1,486

Company Posts That Month

23

Language

English

Hacker News Points

-

Post removed?

No

Source URL

deepinfra.com/blog/nvidia-nemotron-3-super-deepinfra

Summary

DeepInfra's Nemotron 3 Super is a cutting-edge model developed by NVIDIA, featuring a 120 billion parameter architecture that combines Mamba-2, Mixture-of-Experts routing, and attention layers under a novel LatentMoE framework, activating only 12 billion parameters per token for efficiency. The model's prowess is demonstrated by its impressive performance on the RULER benchmark, especially at long context lengths of up to 1 million tokens, surpassing competitors like GPT-OSS-120B. Pre-trained on 25 trillion tokens across diverse domains, Nemotron 3 Super is designed for both deep reasoning and conversational tasks, with a configurable reasoning mode that can be toggled as needed. It offers significant advantages in multi-agent pipelines and complex workflows due to its efficient compute budgeting and agentic scaffolding capabilities. Available on DeepInfra's platform, it supports various API integrations and is priced on a usage-based model, making it accessible for scalable deployment in diverse applications.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Reinforcement learning	2	90	44	24	-13%
LLM	1	9,074	1,640	224	+53%
Multi-agent systems	1	546	198	78	+19%
Vector Search	1	2,268	422	128	+30%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.