Key-Value Means: Transformers with Expandable Block-Compressed Memory

Post Details

Company

Featherless

Date Published

May 13, 2026

Author

Featherless

Word Count

2,418

Company Posts That Month

1

Language

English

Hacker News Points

-

Post removed?

No

Source URL

featherless.ai/blog/key-value-means-transformers-with-expandable-block-compressed-memory

Summary

Key-Value Means (KVM) is an innovative approach designed to integrate the fixed-cost inference benefits of linear RNNs with the high-fidelity memory capabilities of full softmax attention in a single architecture. KVM maintains the familiar Transformer key-value cache structure while treating a portion of it as an expandable recurrent state, enabling a flexible trade-off between memory and computational resources as context length increases. This method interpolates between fixed-state linear RNNs and full attention, allowing for a dynamic adjustment of memory usage. KVM uses Block Sliding Window Attention (BSWA) to manage state updates, ensuring tokens are represented efficiently without redundancy. It employs just-in-time normalization and a winner-take-all merging strategy to maintain the distinctiveness and usability of state rows. The architecture allows for state growth by appending novel, non-redundant tokens, providing sublinear growth in memory usage for long contexts. KVM's design allows it to perform well in long-context benchmarks by maintaining strong recall without the necessity of a full KV cache expansion, offering a middle ground in memory management between traditional RNNs and Transformers.

Trends Found in this Post

No tracked trend matches for this post yet.

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.