voyage-multimodal-3.5: a new multimodal retrieval frontier with video support

Post Details

Company

Voyage AI

Date Published

Jan. 15, 2026

Author

Voyage AI

Word Count

1,089

Language

English

Hacker News Points

-

Source URL

blog.voyageai.com/2026/01/15/voyage-multimodal-3-5

Summary

Voyage-multimodal-3.5 is a new advanced multimodal embedding model designed for improved retrieval of text, images, and videos, building on its predecessor, voyage-multimodal-3. It introduces explicit video frame support and maintains a unified transformer encoder architecture that processes both visual and textual inputs together, avoiding the modality gap seen in CLIP-based models. This model achieves higher retrieval accuracy compared to Cohere Embed v4 and Google Multimodal Embedding 001 across various datasets, including visual document and video retrieval tasks, while also performing competitively on standard text retrieval. It features Matryoshka embeddings for flexible dimensionality and offers multiple quantization options to minimize quality loss. The model is available with token-based pricing and offers free usage up to certain limits, providing tools for embedding videos effectively and improving retrieval pipelines.