SuperHot 8k Token Context Models Are Here For Text Generation
Blog post from RunPod
Esteemed contributor TheBloke has released quantizations of several well-known AI models, such as WizardLM, Vicuna, and Manticore, on GitHub, allowing for increased context retention in AI storytelling. This enhancement addresses a common issue where AI models forget critical scene details once they exceed their context window, unlike human partners who can reference past information. With traditional models having a 2k token limit, crucial details like character positioning and attire risk being forgotten, potentially disrupting immersion. However, with an expanded 8k context limit, users can now incorporate more detailed character histories and scene contexts, enhancing narrative continuity. For users of specific models like Llama, adjustments such as using ExLlama as the loader and modifying certain settings are necessary to utilize the increased context effectively, with further guidance available through the community Discord.