Extending Context Window of a 7B LLM from 8k to 32k using PoSE (Positional Skip-wisE)

Post Details

Company

SuperAGI

Date Published

Dec. 18, 2023

Author

admin_sagi

Word Count

1,150

Language

English

Hacker News Points

-

Source URL

superagi.com/extending-context-window-of-a-7b-llm-from-8k-to-32k-using-pose-positional-skip-wise

Summary

Positional Skip-wisE (PoSE) training is introduced as an efficient method to extend the context window of Large Language Models (LLMs) without the high computational costs associated with full-length fine-tuning. Unlike traditional methods such as Position Interpolation, PoSE manipulates position indices to simulate longer inputs within a fixed context window, minimizing memory and time overhead while maintaining performance. This approach was successfully applied to extend the context window of the Mistral 7B model from 8K to 32K, demonstrating its effectiveness in language modeling and information extraction tasks with minimal performance degradation. PoSE is compatible with all RoPE-based LLMs and position interpolation strategies, providing a cost-effective solution for handling extremely long contexts. The model employing PoSE is available on Hugging Face, validating its practical application and empirical success.