Home / Companies / Baseten / Blog / Post Details
Content Deep Dive

Wan 2.2 video generation in less than 60 seconds

Blog post from Baseten

Post Details
Company
Date Published
Author
Mahmoud Hassan 1 other
Word Count
1,252
Language
English
Hacker News Points
-
Summary

Baseten has developed an optimized runtime for the Wan 2.2 video generation model, resulting in significant performance improvements on NVIDIA Blackwell and Hopper GPUs, achieving up to 3.2 times faster inference on the former and 2.6 times on the latter compared to the default runtime. This enhanced runtime reduces costs by 67% for high-volume deployments and is achieved through a series of kernel optimizations, including improvements to CUDA kernels such as RoPE attention, LayerNorm, and RMSNorm, alongside advancements in the inference engine that maximize GPU utilization. The optimizations maintain output quality while enhancing speed, leveraging techniques such as Ulysses Sequence Parallelism and fine-tuned parameters like sample steps and frame numbers to ensure robust performance across varied video generation requests. These improvements are part of Baseten's broader inference stack that supports large-scale video generation for AI-focused companies, ensuring reliability during demand spikes, and the company continues to explore further advancements in speed-quality trade-offs, suggesting future publications on lossy quality optimizations.