Home / Companies / Portkey / Blog / Post Details
Content Deep Dive

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models - Summary

Blog post from Portkey

Post Details
Company
Date Published
Author
The Quill
Word Count
230
Language
English
Hacker News Points
-
Summary

The paper introduces LLMLingua, a method designed to compress prompts in large language models (LLMs) to speed up model inference and reduce associated costs. This approach integrates a budget controller for determining compression ratios, a token-level iterative compression algorithm, and an instruction tuning-based method aimed at distribution alignment. The research demonstrates that LLMLingua achieves state-of-the-art performance, enabling up to 20x compression with minimal performance degradation. The method was tested on four datasets from diverse domains, showcasing its effectiveness across multiple scenarios. LLMLingua significantly reduces LLM inference costs while maintaining the semantic integrity of prompts, is compatible with black-box LLMs accessible only via API, and requires no gradient flow through the models, making it suitable for various LLM applications.