Baseten brings AI video to life on Nebius

Post Details

Company

Baseten

Date Published

Oct. 6, 2025

Author

Mike Bilodeau

Word Count

867

Language

English

Hacker News Points

-

Source URL

www.baseten.co/blog/ai-video-nebius-baseten-inference-stack

Summary

Baseten's text-to-video system, running on Nebius using the Baseten Inference Stack, is designed to deliver predictable and efficient performance by integrating advanced technologies and infrastructure. The system's Inference Runtime utilizes custom modality-specific kernels, kernel fusion, and attention kernels to optimize video workloads, along with topology-aware parallelism and continuous batching to manage request prioritization and latency. Inference-optimized Infrastructure ensures reliable and scalable performance through intelligent request routing, geo-aware load balancing, SLA-aware autoscaling, and active-active reliability. These components work together to maintain consistent latency and throughput across varying traffic levels, enabling efficient scaling without compromising quality. The use of Nebius's large GPU pools and low-friction capacity growth complements Baseten's sophisticated runtime and infrastructure, ensuring a seamless experience even during demand spikes. The system's ability to adapt to real-time changes and manage multi-cloud capacity efficiently makes it robust enough to handle complex video generation tasks, turning a demo into a fully-fledged product offering.