voyage-code-3: More Accurate Code Retrieval With Lower Dimensional, Quantized Embeddings
Blog post from MongoDB
Voyage-code-3 is a cutting-edge embedding model optimized for code retrieval, surpassing other models like OpenAI-v3-large and CodeSage-large by notable margins on 32 code retrieval datasets. It leverages Matryoshka learning and quantization techniques to support low-dimensional embeddings, which significantly reduce storage and search costs while maintaining high retrieval quality. The model is tailored for complex code retrieval tasks, supported by diverse, high-quality training data curated from public repositories and real-world scenarios. Voyage-code-3's performance is enhanced by its ability to handle multiple embedding dimensions and quantization formats, making it adaptable for various applications. Additionally, MongoDB has announced a leadership transition, with Chirantan "CJ" Desai succeeding Dev Ittycheria as CEO, marking a new phase of growth for the company. MongoDB continues to strengthen its AI Applications Program, expanding its partner network to include companies like Capgemini and IBM, aiming to empower customers with AI expertise and innovative solutions.