FLUX fine-tunes are now fast
Blog post from Replicate
Replicate has introduced optimizations to significantly accelerate fine-tuning FLUX models with user data, achieving speeds comparable to base models, with the improvements being open-source. The adjustments involve utilizing Alex Redden’s flux-fp8-api, torch.compile, and CuDNN attention kernels, and support loading LoRAs from sources like Hugging Face and Civitai. Fine-tunes are quantized as fp8 and merged with the base model, with an automatic increase in lora_scale for optimal output when go_fast=true is enabled. While quantization slightly alters outputs, it minimally affects quality, and all models, including existing and future ones, will benefit from these enhancements. Acknowledging the challenge of comparing model outputs, Replicate emphasizes transparency and the importance of contributing optimizations back to the open-source community, with a commitment to making both fine-tuning and training processes faster through ongoing developments and community collaboration.