Breaking the Dense Ceiling: How voyage-4-large Uses MoE to Scale

Post Details

Company

Voyage AI

Date Published

March 3, 2026

Author

Voyage AI

Word Count

1,207

Language

English

Hacker News Points

-

Source URL

blog.voyageai.com/2026/03/03/moe-voyage-4-large

Summary

Voyage AI's research focuses on improving the efficiency of embedding models by introducing a mixture of experts (MoE) architecture in their voyage-4-large series, which enhances the quality-cost trade-off beyond the capabilities of traditional dense models. The MoE approach involves replacing dense feed-forward networks with sparse ones, utilizing a router to direct tokens to the most suitable experts, thus reducing computational costs while maintaining high retrieval accuracy. By using a standard top-k routing with an activation ratio of 1/10, MoE models can significantly decrease active parameters, offering the intelligence of large models with lower operating costs. Key strategies such as token dropping and router freezing are employed to balance training efficiency and model accuracy while ensuring synchronous computation and effective model merging. The scaling study demonstrates that the MoE architecture achieves a 75% reduction in active parameters compared to dense models with similar retrieval accuracy, highlighting its potential to provide state-of-the-art performance with reduced computational demands.