Home / Companies / Vespa / Blog / Post Details
Content Deep Dive

Exploring the potential of OpenAI Matryoshka 🪆 embeddings with Vespa

Blog post from Vespa

Post Details
Company
Date Published
Author
Andreas Eriksen
Word Count
3,765
Language
English
Hacker News Points
-
Summary

In this blog post, Andreas Eriksen, a senior Vespa engineer, explores the integration of OpenAI's text-embedding-3 embeddings with Vespa, focusing on the Matryoshka Representation Learning (MRL) technique. MRL allows embeddings to be shortened without losing their concept-representing properties, enabling smaller embedding sizes, faster searches, and efficient storage. The post discusses using phased ranking to re-rank top results with full embeddings for accuracy comparable to full-size embeddings. An information retrieval benchmark evaluates the quality of results with various embedding sizes and retrieval strategies. The blog also demonstrates the creation of Vespa schemas, rank profiles, and the deployment of applications to Vespa Cloud, with specific focus on embedding flexibility and query optimization. The experiment highlights the trade-off between performance and accuracy, revealing that even shortened embeddings yield good results with significant memory savings and reduced latency.