LoRA: Low-Rank Adaptation of Large Language Models - Summary

Post Details

Company

Portkey

Date Published

April 15, 2023

Author

Rohit Agarwal

Word Count

271

Language

English

Hacker News Points

-

Source URL

portkey.ai/blog/lora-low-rank-adaptation-of-large-language-models-summary

Summary

Low-Rank Adaptation (LoRA) is introduced as an innovative technique aimed at reducing the number of trainable parameters in natural language processing tasks by embedding trainable rank decomposition matrices within each layer of the Transformer architecture, significantly lowering the number of trainable parameters compared to traditional fine-tuning. Despite this reduction, LoRA maintains or even surpasses model quality in systems like RoBERTa, DeBERTa, GPT-2, and GPT-3, and also offers higher training throughput without adding to inference latency. LoRA can decrease the trainable parameters by up to 10,000 times and reduce GPU memory requirements by threefold in comparison to fine-tuning methods like Adam in GPT-3 175B. This allows pre-trained models to be effectively shared and employed in numerous small modules for various tasks, thus minimizing storage needs and task-switching delays. Additionally, LoRA enhances training efficiency and reduces the hardware requirements by up to three times when using adaptive optimizers, as fewer gradients and optimizer states need to be computed or maintained.