Home / Companies / Deepinfra / Blog / Post Details
Content Deep Dive

NVIDIA Nemotron 3 Super 120B API Benchmarks

Blog post from Deepinfra

Post Details
Company
Date Published
Author
Deep
Word Count
1,867
Language
English
Hacker News Points
-
Summary

DeepInfra has announced the release of the Nemotron 3 Super 120B A12B model, an open-weight reasoning model from NVIDIA, which is designed for reasoning, tool use, agentic workflows, and long-context instruction following across various languages. The model employs a hybrid Mamba-Transformer Mixture-of-Experts architecture with 120B parameters and features such as LatentMoE for routing accuracy and Multi-Token Prediction layers for native speculative decoding. The model has been benchmarked against providers Lightning AI, CoreWeave, and Nebius, showing a wide range in performance and costs. Lightning AI offers the fastest output speed and lowest time to first answer token, but at the highest cost and without function calling support. CoreWeave provides the best cost efficiency with the lowest Time to First Token (TTFT), while Nebius presents a balanced option with competitive pricing and function calling. DeepInfra itself offers the lowest blended cost and full feature support, including function calling and private endpoint deployment, making it a competitive choice for Nemotron 3 Super deployments.