Home / Companies / Qodo / Blog / Post Details
Content Deep Dive

Evaluating RAG for large scale codebases

Blog post from Qodo

Post Details
Company
Date Published
Author
Assaf Pinhasi
Word Count
2,704
Language
English
Hacker News Points
-
Summary

Qodo has developed a RAG-based system for generative AI coding assistants, focusing on enhancing code quality in large-scale enterprise environments. The evaluation of this system's outputs, particularly regarding answer correctness and retrieval accuracy, is paramount. To tackle the challenges of verifying the correctness of outputs derived from large, private data corpora, Qodo has established a robust evaluation framework. This framework includes using LLM-as-a-judge to assess accuracy, creating a ground-truth dataset with domain experts, and employing automated processes to generate diverse and realistic question-answer pairs. The evaluation process is integrated into Qodo's development workflows, utilizing tools like RAGAS and custom LLM-as-a-judge models to ensure high-quality system outputs. These efforts have streamlined regression testing, significantly reducing the manual effort needed to verify the impact of code changes on system quality, and have provided a reliable mechanism for assessing the RAG system's performance, thus supporting its continuous improvement.