Home / Companies / Qodo / Blog / Post Details
Content Deep Dive

DeepCodeBench: Real-World Codebase Understanding by Q&A Benchmarking

Blog post from Qodo

Post Details
Company
Date Published
Author
Ravid Cohen
Word Count
1,680
Language
English
Hacker News Points
-
Summary

Qodo has developed a new benchmark dataset featuring real-world questions derived from complex code repositories to improve research and development in code retrieval systems. This dataset addresses a gap left by existing benchmarks, which often rely on artificially generated code snippets or focus on database retrievals rather than code repositories. The dataset was generated by extracting questions from pull requests (PRs), which are rich sources of complex, interconnected code changes, and using large language models (LLMs) to generate realistic developer questions and answers. The evaluation process employs a method called "fact recall" to objectively assess model predictions by verifying the presence of discrete facts from ground-truth answers in predicted answers. Qodo's Deep Research agent outperformed others like OpenAI's Codex and Anthropic's Claude in fact recall performance, demonstrating both speed and accuracy in retrieving code-related information. The release includes 1,144 question-answer pairs, metadata, context, and prompts used in the creation of the dataset, aiming to enhance the capabilities of AI-assisted code navigation and comprehension tools.