The Heterogeneous Feature of RoPE-based Attention in Long-Context LLMs

Post Details

Company

HuggingFace

Date Published

Nov. 15, 2025

Author

Xiaoran Liu (SII)

Word Count

1,834

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/SII-xrliu/heterogeneous-features

Summary

Xiaoran Liu's presentation explores the heterogeneous features of attention in long-context large language models (LLMs), focusing on how different attention components across query-key (qk) dimensions play distinct roles. The study leverages the Rotary Position Embedding (RoPE) perspective to explain this heterogeneity, which stems from the use of sinusoidal functions with varying frequencies that define periodicity and monotonicity across dimensions. This leads to innovative applications in length extrapolation, cache optimization, and long-video modeling. The research shows that lower dimensions are stable and handle short-period functions, while upper dimensions manage long-range dependencies, and proposes methods such as FourierAttention for efficient cache compression. The findings contribute to enhancing LLM capabilities, including multi-modality embedding and diffusion-based language models, and emphasize the broader implications for architecture, training, and evaluation in long-context processing.