NVIDIA Nemotron 3 Super 120B API Benchmarks

Post Details

Company

Deepinfra

Date Published

May 25, 2026

Author

Deep

Word Count

1,867

Company Posts That Month

23

Language

English

Hacker News Points

-

Post removed?

No

Source URL

deepinfra.com/blog/nvidia-nemotron-3-super-120b-api-benchmarks-2

Summary

DeepInfra has announced the release of the Nemotron 3 Super 120B A12B model, an open-weight reasoning model from NVIDIA, which is designed for reasoning, tool use, agentic workflows, and long-context instruction following across various languages. The model employs a hybrid Mamba-Transformer Mixture-of-Experts architecture with 120B parameters and features such as LatentMoE for routing accuracy and Multi-Token Prediction layers for native speculative decoding. The model has been benchmarked against providers Lightning AI, CoreWeave, and Nebius, showing a wide range in performance and costs. Lightning AI offers the fastest output speed and lowest time to first answer token, but at the highest cost and without function calling support. CoreWeave provides the best cost efficiency with the lowest Time to First Token (TTFT), while Nebius presents a balanced option with competitive pricing and function calling. DeepInfra itself offers the lowest blended cost and full feature support, including function calling and private endpoint deployment, making it a competitive choice for Nemotron 3 Super deployments.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Vector Search	1	2,268	422	128	+30%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.