Leveraging frozen embeddings in Vespa with SentenceTransformers
Blog post from Vespa
Leveraging frozen embeddings within the Vespa search application using SentenceTransformers offers a streamlined approach to managing the complexity of hybrid search systems, particularly in dynamic environments like e-commerce where search patterns frequently change. By freezing document vector representations and updating only query representations, this method reduces the need for frequent recalculation of embeddings when models are retrained, thus easing the maintenance burden. The article details the implementation of a bi-encoder model with asymmetric dense layers to achieve frozen embeddings, utilizing the sentence-transformers library for training, and integrating these models into Vespa through ONNX format exportation and custom embedding components. This approach not only facilitates efficient memory usage by sharing transformer weights between document and query models but also offers a plug-and-play training procedure for embedding generation, ultimately enhancing the manageability and scalability of Vespa applications.