Home / Companies / Voyage AI / Blog / Post Details
Content Deep Dive

Breaking the Dense Ceiling: How voyage-4-large Uses MoE to Scale

Blog post from Voyage AI

Post Details
Company
Date Published
Author
Voyage AI
Word Count
1,207
Language
English
Hacker News Points
-
Summary

Voyage AI's research focuses on improving the efficiency of embedding models by introducing a mixture of experts (MoE) architecture in their voyage-4-large series, which enhances the quality-cost trade-off beyond the capabilities of traditional dense models. The MoE approach involves replacing dense feed-forward networks with sparse ones, utilizing a router to direct tokens to the most suitable experts, thus reducing computational costs while maintaining high retrieval accuracy. By using a standard top-k routing with an activation ratio of 1/10, MoE models can significantly decrease active parameters, offering the intelligence of large models with lower operating costs. Key strategies such as token dropping and router freezing are employed to balance training efficiency and model accuracy while ensuring synchronous computation and effective model merging. The scaling study demonstrates that the MoE architecture achieves a 75% reduction in active parameters compared to dense models with similar retrieval accuracy, highlighting its potential to provide state-of-the-art performance with reduced computational demands.