Home / Companies / Voyage AI / Blog / Post Details
Content Deep Dive

How do we evaluate vector-based code retrieval?

Blog post from Voyage AI

Post Details
Company
Date Published
Author
Voyage AI
Word Count
2,078
Language
English
Hacker News Points
-
Summary

Modern coding assistants and agents utilize code retrieval systems, which often rely on embedding models to perform vector-based searches for relevant code snippets, docstrings, and documentation within extensive repositories. Despite their widespread use, challenges remain in evaluating the quality of these systems due to a lack of diverse benchmarking datasets and methodologies. Voyage AI addresses these issues by exploring typical subtasks of code retrieval, such as text-to-code, code-to-code, and docstring-to-code, and examining existing datasets like CodeSearchNet and CoSQA, which suffer from limitations such as noisy labels and lack of deep algorithmic reasoning. To improve code retrieval evaluation, Voyage AI proposes creating new datasets by repurposing question-answer datasets and leveraging code repositories like GitHub. They have developed a combination of public and proprietary datasets, avoiding contamination and providing robust evaluation metrics across various code embedding models. Future efforts include sharing in-house datasets for collaborative research and exploring the use of large language models as judges for retrieval performance.