MUVERA: Making Multivectors More Performant
Blog post from Qdrant
MUVERA embeddings, developed by Google Research, address the challenge of slow multi-vector searches by transforming multi-vector representations into single vectors for faster initial retrieval and then using the original multi-vectors for reranking the top results. This approach combines the speed of single-vector searches with the accuracy of multi-vector retrieval, significantly improving search efficiency. MUVERA embeddings are created by clustering vector spaces and using Locality-Sensitive Hashing (LSH) techniques like SimHash to transform variable-length sequences into fixed-dimensional representations. The FastEmbed 0.7.2 version supports MUVERA, offering approximately 7x speed improvements while maintaining the quality of search, making it a practical solution for multi-vector retrieval applications. However, the larger size of MUVERA embeddings compared to traditional single-vector embeddings necessitates careful consideration of storage and retrieval efficiency.