New in February 2024

Company

Baseten

Date Published

Feb. 29, 2024

Author

Baseten

Word count

634

Language

English

Hacker News points

None

URL

www.baseten.co/blog/new-in-february-2024

Summary

NVIDIA has released new improvements in February 2024, focusing on model performance across four key factors: latency, throughput, quality, and cost. The company now offers model inference on H100 GPUs, which feature exceptional performance for running ML models due to their high tensor compute, memory bandwidth, and VRAM. This results in a significant reduction in cost for running high-traffic workloads. Additionally, NVIDIA has optimized Stable Diffusion XL with TensorRT, achieving 40% lower latency and 70% higher throughput on H100 GPUs compared to a baseline implementation. The company has also introduced SDXL Lightning, which generates images in under one second per image, while QwenVL is an open-source visual language model that combines vision and language capabilities. Furthermore, NVIDIA's refreshed billing dashboard provides daily insights into usage and spend, offering improved visibility for users.