Sub-second image generation with Flux.2 and Qwen-Image

Post Details

Company

Baseten

Date Published

May 18, 2026

Author

Aaryam Sharma

Word Count

750

Company Posts That Month

8

Language

English

Hacker News Points

-

Source URL

www.baseten.co/blog/sub-second-image-generation-flux2-and-qwen-image

Summary

Baseten has significantly optimized image generation serving for Flux.2 [dev] and Qwen-Image models, achieving up to 2.3x and 1.6x speedups on NVIDIA Blackwell GPUs and 1.9x and 1.1x on NVIDIA Hopper GPUs compared to SGLang. These improvements reduce single-request latency, crucial for latency-sensitive applications like creative tools and marketing, enhancing user experience, throughput, and cost efficiency. The optimizations leverage hardware-aware quantization, memory improvements, and specialized kernels, with Baseten FP4 on B200 GPUs delivering notable latency reductions to under one second for Flux.2 [dev] and significant speedups for Qwen-Image. The Baseten Inference Stack supports various image generation parameters, ensuring reliability and efficiency in production settings, and is adaptable to other models, facilitating low-latency, high-reliability deployments.

Trends Found in this Post

No tracked trend matches for this post yet.