BitDelta: Your Fine-Tune May Only Be Worth One Bit

Company

Together AI

Date Published

Feb. 20, 2024

Author

James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

Word count

1690

Language

English

Hacker News points

None

URL

www.together.ai/blog/bitdelta

Summary

The pretrain-finetune paradigm has revolutionized machine learning by enabling LLMs to align with distinct user preferences or specialized task requirements through fine-tuning. However, multi-tenant serving is challenging due to expensive storage and serving challenges. Researchers have proposed a novel approach called BitDelta, which decomposes the weights of fine-tuned models into their pre-trained components and an additional delta, allowing for 1-bit quantization without compromising performance. This approach addresses both storage and serving challenges by reducing GPU memory requirements and improving inference speedup. BitDelta is fast, general, and can retain all sorts of fine-tuning information, making it a promising solution for the future of machine learning in multi-tenant settings.