Qwen3.5: Nobody Agrees on Attention Anymore

Post Details

Company

HuggingFace

Date Published

Feb. 17, 2026

Author

Maxime Labonne

Word Count

1,192

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/mlabonne/qwen35

Summary

Qwen3.5-397B-A17B, released by Alibaba's Qwen team, is a next-generation foundation model featuring a 397 billion parameter Mixture-of-Experts architecture with 17 billion active parameters per token and a hybrid attention mechanism that alternates between linear and full attention. This architectural design allows it to process long contexts more efficiently, drawing on Gated Delta Networks and reinforcement learning across diverse environments for improved adaptability. The model is natively multimodal, supporting 201 languages, and excels in instruction following and visual understanding, though it is not the top performer in any single category. Its performance in benchmarks such as AIME 2026, IFBench, and SWE-bench indicates strong but not dominant capabilities in reasoning, math, and coding. The release of Qwen3.5 highlights a shift in AI development focus towards hybrid attention mechanisms and agentic tasks, moving away from traditional chatbot benchmarks. The model is expected to expand into a family of variants, further validating its innovative architecture.