Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Create Mixtures of Experts with MergeKit

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Maxime Labonne
Word Count
2,007
Language
-
Hacker News Points
-
Summary

The Mixture of Experts (MoE) architecture has become increasingly popular due to innovations like Mixtral and new methods of creating MoEs using Arcee's MergeKit library. Traditional MoEs involve pre-training from scratch, but MergeKit enables the creation of "frankenMoEs" by combining multiple pre-trained models, allowing for improved performance and efficiency. These frankenMoEs utilize specialized subnetworks or "experts" that are selectively activated based on input, leading to faster training and more efficient inference. While this approach offers advantages, it also requires significant VRAM and presents challenges in fine-tuning and memory usage. The article explores the process of creating a frankenMoE with MergeKit, using a selection of experts to perform various tasks, resulting in a model like Beyonder-4x7B-v3 that performs well on several benchmarks. Despite some trade-offs, frankenMoEs show promise in preserving knowledge and producing robust models, with potential improvements depending on hardware capabilities.