Company
Date Published
Author
Baseten
Word count
634
Language
English
Hacker News points
None

Summary

NVIDIA has released new improvements in February 2024, focusing on model performance across four key factors: latency, throughput, quality, and cost. The company now offers model inference on H100 GPUs, which feature exceptional performance for running ML models due to their high tensor compute, memory bandwidth, and VRAM. This results in a significant reduction in cost for running high-traffic workloads. Additionally, NVIDIA has optimized Stable Diffusion XL with TensorRT, achieving 40% lower latency and 70% higher throughput on H100 GPUs compared to a baseline implementation. The company has also introduced SDXL Lightning, which generates images in under one second per image, while QwenVL is an open-source visual language model that combines vision and language capabilities. Furthermore, NVIDIA's refreshed billing dashboard provides daily insights into usage and spend, offering improved visibility for users.