NVIDIA Nemotron 3 Super 120B API Benchmarks
Blog post from Deepinfra
DeepInfra has announced the release of the Nemotron 3 Super 120B A12B model, an open-weight reasoning model from NVIDIA, which is designed for reasoning, tool use, agentic workflows, and long-context instruction following across various languages. The model employs a hybrid Mamba-Transformer Mixture-of-Experts architecture with 120B parameters and features such as LatentMoE for routing accuracy and Multi-Token Prediction layers for native speculative decoding. The model has been benchmarked against providers Lightning AI, CoreWeave, and Nebius, showing a wide range in performance and costs. Lightning AI offers the fastest output speed and lowest time to first answer token, but at the highest cost and without function calling support. CoreWeave provides the best cost efficiency with the lowest Time to First Token (TTFT), while Nebius presents a balanced option with competitive pricing and function calling. DeepInfra itself offers the lowest blended cost and full feature support, including function calling and private endpoint deployment, making it a competitive choice for Nemotron 3 Super deployments.