Home / Companies / Vespa / Blog / Post Details
Content Deep Dive

Text-to-image search with Vespa

Blog post from Vespa

Post Details
Company
Date Published
Author
Lester Solbakken
Word Count
2,157
Language
English
Hacker News Points
-
Summary

Text-to-image search has evolved significantly with the advent of machine learning, transitioning from reliance on textual labels to leveraging models like OpenAI's CLIP, which understands both text and image content. The CLIP model, trained on 400 million image-text pairs, enables zero-shot learning, allowing it to classify images with labels not seen during training. This model uses two sub-models for text and images, generating vectors that are compared using cosine distance to find matches. Vespa, a platform equipped with capabilities like approximate nearest neighbor search and machine-learned model inference, is used to build a text-to-image search application that indexes and retrieves images based on user-provided textual descriptions. The sample application demonstrates how CLIP facilitates efficient and accurate image retrieval and can be applied to any image collection, offering a robust baseline for further fine-tuning.