SDXL inference in under 2 seconds: the ultimate guide to Stable Diffusion optimization

Company

Baseten

Date Published

Aug. 30, 2023

Author

Varun Shenoy, Philip Kiely

Word count

1352

Language

English

Hacker News points

None

URL

www.baseten.co/blog/sdxl-inference-in-under-2-seconds-the-ultimate-guide-to-stable-diffusion-optimiza

Summary

To optimize model inference for Stable Diffusion XL (SDXL), the author experimented with various tweaks, including reducing the number of steps from 50 to 20, setting Classifier Free Guidance (CFG) to zero after 8 steps, and using the refiner model for the final 20% of steps. Additionally, they used `torch.compile` with max-autotune to optimize the model for an A100 GPU, chose a fp16 vae and efficient attention implementation to improve memory efficiency, and deployed the optimized version of SDXL in two clicks from the model library, achieving a model inference time of 1.92 seconds on an A100. The author also notes that these optimizations can be applied to standard Stable Diffusion, achieving generation times of under a second on an A10G and under half a second on an A100.