Company
Date Published
Author
The Quill
Word count
343
Language
English
Hacker News points
None

Summary

Mixtral 8x7B is a Sparse Mixture of Experts (SMoE) language model introduced in a paper that outperforms existing models such as Llama 2 70B and GPT-3.5 across various benchmarks, including mathematics, code generation, and multilingual tasks. Utilizing a routing network that selects two experts per token, the model accesses 47B parameters but actively uses only 13B during inference, leading to improved efficiency and faster processing speeds. The fine-tuned version, Mixtral 8x7B – Instruct, excels in instruction-following tasks with reduced biases and surpasses the performance of other leading models. Both versions of Mixtral are released under the Apache 2.0 license, facilitating open-source integration and broad accessibility, and contributions have been made to the vLLM project to support this integration.