Extending Context Window of a 7B LLM from 8k to 32k using PoSE (Positional Skip-wisE)
Blog post from SuperAGI
Positional Skip-wisE (PoSE) training is introduced as an efficient method to extend the context window of Large Language Models (LLMs) without the high computational costs associated with full-length fine-tuning. Unlike traditional methods such as Position Interpolation, PoSE manipulates position indices to simulate longer inputs within a fixed context window, minimizing memory and time overhead while maintaining performance. This approach was successfully applied to extend the context window of the Mistral 7B model from 8K to 32K, demonstrating its effectiveness in language modeling and information extraction tasks with minimal performance degradation. PoSE is compatible with all RoPE-based LLMs and position interpolation strategies, providing a cost-effective solution for handling extremely long contexts. The model employing PoSE is available on Hugging Face, validating its practical application and empirical success.