Home / Companies / Voyage AI / Blog / Post Details
Content Deep Dive

voyage-multimodal-3.5: a new multimodal retrieval frontier with video support

Blog post from Voyage AI

Post Details
Company
Date Published
Author
Voyage AI
Word Count
1,089
Language
English
Hacker News Points
-
Summary

Voyage-multimodal-3.5 is a new advanced multimodal embedding model designed for improved retrieval of text, images, and videos, building on its predecessor, voyage-multimodal-3. It introduces explicit video frame support and maintains a unified transformer encoder architecture that processes both visual and textual inputs together, avoiding the modality gap seen in CLIP-based models. This model achieves higher retrieval accuracy compared to Cohere Embed v4 and Google Multimodal Embedding 001 across various datasets, including visual document and video retrieval tasks, while also performing competitively on standard text retrieval. It features Matryoshka embeddings for flexible dimensionality and offers multiple quantization options to minimize quality loss. The model is available with token-based pricing and offers free usage up to certain limits, providing tools for embedding videos effectively and improving retrieval pipelines.