SDXL is a text-to-image model that can generate images with high quality and flexibility. It uses a modular architecture composed of four major components: CLIP, UNet, Refiner, and VAE. The UNet model is the main component of SDXL and runs iteratively in inference steps to create an image representation in latent space. Optimizing the performance of SDXL involves individually optimizing each component in the pipeline using NVIDIA TensorRT, a software development kit for high-performance deep learning inference. The optimization process includes exporting the model pipeline to ONNX, making an optimized engine for serving each sub-model within SDXL, and deploying the optimized models as API endpoints. With TensorRT, SDXL achieves up to 40% lower latency and 70% higher throughput than the unoptimized model on the same hardware, making it viable for high-latency and cost-sensitive use cases. The techniques used can be applied to similar image generation pipelines, including SDXL Turbo, which generates images with even higher quality but at a slightly lower speed.