Mixtral of Experts - Summary

Post Details

Company

Portkey

Date Published

Jan. 9, 2024

Author

The Quill

Word Count

343

Language

English

Hacker News Points

-

Source URL

portkey.ai/blog/mixtral-of-experts-summary

Summary

Mixtral 8x7B is a Sparse Mixture of Experts (SMoE) language model introduced in a paper that outperforms existing models such as Llama 2 70B and GPT-3.5 across various benchmarks, including mathematics, code generation, and multilingual tasks. Utilizing a routing network that selects two experts per token, the model accesses 47B parameters but actively uses only 13B during inference, leading to improved efficiency and faster processing speeds. The fine-tuned version, Mixtral 8x7B – Instruct, excels in instruction-following tasks with reduced biases and surpasses the performance of other leading models. Both versions of Mixtral are released under the Apache 2.0 license, facilitating open-source integration and broad accessibility, and contributions have been made to the vLLM project to support this integration.