Home / Companies / Replicate / Blog / Post Details
Content Deep Dive

FLUX fine-tunes are now fast

Blog post from Replicate

Post Details
Company
Date Published
Author
bfirsh
Word Count
595
Language
English
Hacker News Points
-
Summary

Replicate has introduced optimizations to significantly accelerate fine-tuning FLUX models with user data, achieving speeds comparable to base models, with the improvements being open-source. The adjustments involve utilizing Alex Redden’s flux-fp8-api, torch.compile, and CuDNN attention kernels, and support loading LoRAs from sources like Hugging Face and Civitai. Fine-tunes are quantized as fp8 and merged with the base model, with an automatic increase in lora_scale for optimal output when go_fast=true is enabled. While quantization slightly alters outputs, it minimally affects quality, and all models, including existing and future ones, will benefit from these enhancements. Acknowledging the challenge of comparing model outputs, Replicate emphasizes transparency and the importance of contributing optimizations back to the open-source community, with a commitment to making both fine-tuning and training processes faster through ongoing developments and community collaboration.