Any* Embedding Model Can Become a Late Interaction Model... If You Give It a Chance!

Company

Qdrant

Date Published

Aug. 14, 2024

Author

Kacper Łukawski

Word count

2036

Language

English

Hacker News points

None

URL

qdrant.tech/articles/late-interaction-models

Summary

Dense embedding models, traditionally used for single-vector representations, can be effectively adapted for late interaction scenarios by leveraging their output token embeddings as multi-vector representations. This adaptation, facilitated by Qdrant's multi-vector feature, has shown that these models can compete with or even surpass specialized late interaction models in retrieval performance while offering reduced complexity and increased efficiency. Experimental results demonstrate that models like BAAI/bge-small-en outperform both sparse and late interaction models, although they require more storage due to higher-dimensional output token embeddings. Implementing vector compression can mitigate storage and computational costs without significantly affecting retrieval quality. Additionally, the new Query API in Qdrant 1.10 allows for sophisticated retrieval pipelines, enabling initial retrieval with pooled single vectors and subsequent reranking using output token embeddings. This approach promises enhanced retrieval quality and efficiency, highlighting the potential of dense models in advanced search applications while offering opportunities for further research and optimization.