The pretrain-finetune paradigm has revolutionized machine learning by enabling LLMs to align with distinct user preferences or specialized task requirements through fine-tuning. However, multi-tenant serving is challenging due to expensive storage and serving challenges. Researchers have proposed a novel approach called BitDelta, which decomposes the weights of fine-tuned models into their pre-trained components and an additional delta, allowing for 1-bit quantization without compromising performance. This approach addresses both storage and serving challenges by reducing GPU memory requirements and improving inference speedup. BitDelta is fast, general, and can retain all sorts of fine-tuning information, making it a promising solution for the future of machine learning in multi-tenant settings.