Wan 2.2 video generation in less than 60 seconds

Post Details

Company

Baseten

Date Published

Jan. 24, 2026

Author

Mahmoud Hassan 1 other

Word Count

1,252

Language

English

Hacker News Points

-

Source URL

www.baseten.co/blog/wan-2-2-video-generation-in-less-than-60-seconds

Summary

Baseten has developed an optimized runtime for the Wan 2.2 video generation model, resulting in significant performance improvements on NVIDIA Blackwell and Hopper GPUs, achieving up to 3.2 times faster inference on the former and 2.6 times on the latter compared to the default runtime. This enhanced runtime reduces costs by 67% for high-volume deployments and is achieved through a series of kernel optimizations, including improvements to CUDA kernels such as RoPE attention, LayerNorm, and RMSNorm, alongside advancements in the inference engine that maximize GPU utilization. The optimizations maintain output quality while enhancing speed, leveraging techniques such as Ulysses Sequence Parallelism and fine-tuned parameters like sample steps and frame numbers to ensure robust performance across varied video generation requests. These improvements are part of Baseten's broader inference stack that supports large-scale video generation for AI-focused companies, ensuring reliability during demand spikes, and the company continues to explore further advancements in speed-quality trade-offs, suggesting future publications on lossy quality optimizations.