MAX 25.2: Unleash the power of your H200's–without CUDA!

Post Details

Company

Modular

Date Published

March 25, 2025

Author

Modular Team

Word Count

1,042

Language

English

Hacker News Points

-

Source URL

www.modular.com/blog/max-25-2-unleash-the-power-of-your-h200s-without-cuda

Summary

MAX 25.2 is a significant update designed to enhance the performance and deployment of large language models (LLMs) without relying on CUDA, featuring support for over 500 GenAI models and offering multi-GPU compatibility on NVIDIA H100 and H200 hardware. This release includes advancements such as improved scheduling, batching, and caching for superior total cost of ownership (TCO) and performance, making MAX 12% faster than previous benchmarks. The ultra-slim containers reduce deployment times by being 80% smaller than traditional NVIDIA containers, and the integration of Mojo allows for custom, high-performance GPU programming. The update also introduces GPTQ quantization to efficiently run large models, reducing memory usage significantly. By rebuilding the AI stack from scratch, MAX aims to provide an intuitive "it just works" experience that eliminates CUDA-related issues, making it accessible for diverse AI applications and flexible for developers and researchers looking to fully leverage GPU capabilities.