Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Nikita Pavlichenko
Word Count
600
Language
-
Hacker News Points
-
Summary

JetBrains has introduced Mellum2, a 12-billion parameter Mixture-of-Experts (MoE) model, designed to efficiently handle natural language and code tasks by activating only 2.5 billion parameters per token for high-throughput, low-latency inference. Released under the Apache 2.0 license, Mellum2 is optimized for various applications, including routing, retrieval-augmented generation (RAG), summarization, and private deployments, particularly in software engineering contexts. The model stands out for its competitive benchmark performance and more than double the inference speed compared to similar-sized models, making it suitable for high-frequency tasks within larger AI systems. Mellum2's architecture, which focuses on text and code rather than multimodal tasks, aims to enhance efficiency and reduce costs for real-time workloads, positioning it as a key component in modern AI systems that require specialized yet integrated model components. The model is available for download on Hugging Face, and detailed information on its architecture, training, and evaluation is provided in a technical report.