Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

The Heterogeneous Feature of RoPE-based Attention in Long-Context LLMs

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Xiaoran Liu (SII)
Word Count
1,834
Language
-
Hacker News Points
-
Summary

Xiaoran Liu's presentation explores the heterogeneous features of attention in long-context large language models (LLMs), focusing on how different attention components across query-key (qk) dimensions play distinct roles. The study leverages the Rotary Position Embedding (RoPE) perspective to explain this heterogeneity, which stems from the use of sinusoidal functions with varying frequencies that define periodicity and monotonicity across dimensions. This leads to innovative applications in length extrapolation, cache optimization, and long-video modeling. The research shows that lower dimensions are stable and handle short-period functions, while upper dimensions manage long-range dependencies, and proposes methods such as FourierAttention for efficient cache compression. The findings contribute to enhancing LLM capabilities, including multi-modality embedding and diffusion-based language models, and emphasize the broader implications for architecture, training, and evaluation in long-context processing.