FLUX is fast and it's open source
Blog post from Replicate
Replicate has significantly enhanced the speed of the FLUX model and made these optimizations open-source, allowing others to examine and build upon them. The improvements, achieved through model optimization and a new synchronous HTTP API, have resulted in faster image processing times, especially when using the Python client from the west coast of the US. These optimizations, based on Alex Redden's flux-fp8-api, utilize torch.compile and fast CuDNN attention kernels, with the quantization slightly altering output without compromising quality. This transparency in sharing advancements aims to foster community collaboration and ensure open-source models are as fast as proprietary ones, thus inviting contributions to further refine the FLUX model. Additionally, Replicate encourages users to explore, fine-tune, and deploy custom versions of FLUX, with ongoing efforts to further enhance speed and functionality.