How do we evaluate vector-based code retrieval?

Post Details

Company

Voyage AI

Date Published

Dec. 4, 2024

Author

Voyage AI

Word Count

2,078

Language

English

Hacker News Points

-

Source URL

blog.voyageai.com/2024/12/04/code-retrieval-eval

Summary

Modern coding assistants and agents utilize code retrieval systems, which often rely on embedding models to perform vector-based searches for relevant code snippets, docstrings, and documentation within extensive repositories. Despite their widespread use, challenges remain in evaluating the quality of these systems due to a lack of diverse benchmarking datasets and methodologies. Voyage AI addresses these issues by exploring typical subtasks of code retrieval, such as text-to-code, code-to-code, and docstring-to-code, and examining existing datasets like CodeSearchNet and CoSQA, which suffer from limitations such as noisy labels and lack of deep algorithmic reasoning. To improve code retrieval evaluation, Voyage AI proposes creating new datasets by repurposing question-answer datasets and leveraging code repositories like GitHub. They have developed a combination of public and proprietary datasets, avoiding contamination and providing robust evaluation metrics across various code embedding models. Future efforts include sharing in-house datasets for collaborative research and exploring the use of large language models as judges for retrieval performance.