Home / Companies / Voyage AI / Blog / Post Details
Content Deep Dive

voyage-code-3: more accurate code retrieval with lower dimensional, quantized embeddings

Blog post from Voyage AI

Post Details
Company
Date Published
Author
Voyage AI
Word Count
1,175
Language
English
Hacker News Points
-
Summary

Voyage-code-3 is a next-generation embedding model designed for code retrieval, outperforming its predecessors OpenAI-v3-large and CodeSage-large by significant margins across 32 code retrieval datasets. It offers reduced storage and search costs through support for smaller dimensions and quantized formats like int8 and binary, enabled by Matryoshka learning and quantization-aware training. The model maintains high retrieval quality despite lower precision, with flexible embeddings ranging from 256 to 2048 dimensions and a context length of 32K tokens. Addressing the unique challenges of code retrieval, voyage-code-3 is trained on a diverse, high-quality code corpus and evaluated on datasets tailored to real-world applications, demonstrating superior performance in various retrieval tasks such as text-to-code and code-to-code. Users can further enhance retrieval quality with binary rescoring, and the model is accessible with an initial free allocation of tokens for exploration and experimentation.