LLM performance up 15.4%: MLPerf v5.1 confirms NVIDIA HGX B200 on Lambda is built for enterprise inference

Company

Lambda

Date Published

Sept. 9, 2025

Author

Anket Sah

Word count

782

Language

English

Hacker News points

None

URL

lambda.ai/blog/lambda-mlperf-inference-v5.1

Summary

Lambda's MLPerf Inference v5.1 results demonstrate significant performance improvements, with up to 15.4% gains over prior benchmarks, showcasing the capability of NVIDIA HGX B200-powered 1-Click Clusters to enhance enterprise inference workloads. The results highlight the performance of models like Llama 2 70B, Llama 3.1 405B, and Stable Diffusion XL across different scenarios, with the Llama 3.1 405B model achieving notable server-side gains. These benchmarks were achieved using NVIDIA's latest technologies, including TensorRT 10.11 and CUDA 12.9, emphasizing not only hardware advancements but also software optimizations. The tests were conducted on a consistent system configuration, focusing on maximizing throughput and minimizing latency under real-world conditions. Lambda's infrastructure, designed for enterprise AI, supports scalable GPU clusters with flexible rental terms, making it suitable for both startups and enterprises looking to validate AI use cases or scale their operations.