Home / Companies / Baseten / Blog / Post Details
Content Deep Dive

Introducing NVIDIA Nemotron 3 Ultra: The Nemotron 3.x family is here!

Blog post from Baseten

Post Details
Company
Date Published
Author
Marylise Tauzia
Word Count
1,638
Language
English
Hacker News Points
-
Summary

Nemotron 3 Ultra is an advanced mixture-of-experts (MoE) language model developed by NVIDIA, designed to enhance the performance of long-running autonomous agents by utilizing Mamba layers instead of traditional attention mechanisms. This architectural choice allows the model to maintain consistent processing speed regardless of task length, as Mamba layers' computational cost grows linearly rather than quadratically with context length. As a result, Nemotron 3 Ultra offers up to five times faster inference and up to 30% lower costs compared to other open frontier models, making it particularly effective for tasks such as coding, deep research, enterprise workflows, and chip design, where agent efficiency over extended operations is crucial. The model is fully open-source, allowing users to operate it on their own infrastructure, and it incorporates both Mamba and attention layers to balance efficiency with precision. NVIDIA's release strategy prioritizes task completion over single-turn benchmarks, making Nemotron 3 Ultra a practical choice for real-world applications where speed and reliability are critical.