Evaluating RAG for large scale codebases

Post Details

Company

Qodo

Date Published

Feb. 14, 2025

Author

Assaf Pinhasi

Word Count

2,704

Language

English

Hacker News Points

-

Source URL

www.qodo.ai/blog/evaluating-rag-for-large-scale-codebases

Summary

Qodo has developed a RAG-based system for generative AI coding assistants, focusing on enhancing code quality in large-scale enterprise environments. The evaluation of this system's outputs, particularly regarding answer correctness and retrieval accuracy, is paramount. To tackle the challenges of verifying the correctness of outputs derived from large, private data corpora, Qodo has established a robust evaluation framework. This framework includes using LLM-as-a-judge to assess accuracy, creating a ground-truth dataset with domain experts, and employing automated processes to generate diverse and realistic question-answer pairs. The evaluation process is integrated into Qodo's development workflows, utilizing tools like RAGAS and custom LLM-as-a-judge models to ensure high-quality system outputs. These efforts have streamlined regression testing, significantly reducing the manual effort needed to verify the impact of code changes on system quality, and have provided a reliable mechanism for assessing the RAG system's performance, thus supporting its continuous improvement.