Company
Date Published
Author
Rohit Agarwal
Word count
271
Language
English
Hacker News points
None

Summary

Low-Rank Adaptation (LoRA) is introduced as an innovative technique aimed at reducing the number of trainable parameters in natural language processing tasks by embedding trainable rank decomposition matrices within each layer of the Transformer architecture, significantly lowering the number of trainable parameters compared to traditional fine-tuning. Despite this reduction, LoRA maintains or even surpasses model quality in systems like RoBERTa, DeBERTa, GPT-2, and GPT-3, and also offers higher training throughput without adding to inference latency. LoRA can decrease the trainable parameters by up to 10,000 times and reduce GPU memory requirements by threefold in comparison to fine-tuning methods like Adam in GPT-3 175B. This allows pre-trained models to be effectively shared and employed in numerous small modules for various tasks, thus minimizing storage needs and task-switching delays. Additionally, LoRA enhances training efficiency and reduces the hardware requirements by up to three times when using adaptive optimizers, as fewer gradients and optimizer states need to be computed or maintained.