Top open-source text-to-video AI models

Company

Modal

Date Published

Oct. 30, 2024

Author

Yiren Lu

Word count

563

Language

English

Hacker News points

None

URL

modal.com/blog/text-to-video-ai-article

Summary

Open-source text-to-video AI models like HunyuanVideo, Mochi, and Wan2.1 are rapidly approaching the quality of leading closed-source models, with Hunyuan being the leading open-source model consistently at or near the top of HuggingFace's trending models. These models have large parameter sizes, often over 10 billion, but offer features such as diffusers integration, FP8 model weights to save GPU memory, and fine-tuning capabilities. Mochi is a popular high-quality text-to-video model with easy deployment on Modal and support for LoRA fine-tuning. Wan2.1 is the newest state-of-the-art model with 14 billion parameters, smaller 1.3 billion parameter version also available, and ComfyUI integration. With GPUs becoming easier and cheaper to access, deploying open-source models like Hunyuan, Mochi, and Wan2.1 are becoming more attractive options for users.