More efficient multi-vector embeddings with MUVERA

Post Details

Company

Weaviate

Date Published

June 5, 2025

Author

Roberto Esposito, Joon-Pil (JP) Hwang

Word Count

4,261

Language

English

Hacker News Points

-

Source URL

weaviate.io/blog/muvera

Summary

Weaviate 1.31 introduces the MUVERA encoding algorithm, which converts multi-vector embeddings into single fixed-size vectors, significantly reducing memory and computational costs. This innovation addresses the challenges of multi-vector models, such as high memory usage and slower import and search speeds, by transforming complex multi-vector embeddings into simpler, fixed-dimensional encodings. In tests using the LoTTE dataset, MUVERA reduced memory footprint by approximately 70% and improved import times from over 20 minutes to 3-6 minutes, albeit with a slight loss in recall quality. This trade-off can be mitigated by adjusting HNSW ef values, which, while increasing recall, may reduce query throughput. MUVERA is particularly suited for large-scale deployments where memory costs are substantial, and applications that can tolerate minor recall degradation. The algorithm's implementation in Weaviate 1.31+ offers configuration options to balance these trade-offs, providing a practical solution for managing extensive datasets efficiently.