How Evaluation-Driven Development (EDD) Works

Post Details

Company

Comet

Date Published

July 2, 2026

Author

Paul Iusztin

Word Count

2,908

Company Posts That Month

1

Language

English

Hacker News Points

-

Source URL

www.comet.com/site/blog/edd-opik-project-example

Summary

Evaluation-Driven Development (EDD) is a structured approach to AI feature development that ensures changes are effective and do not introduce regressions before they are merged into the main codebase. The process involves generating test data to simulate real-world scenarios and using an open-source tool called Opik for running experiments and evaluating the performance of new features. EDD relies on two modes of testing: a quick manual check for minor adjustments and automated experiments for larger changes, with simulated traces covering both happy paths and adversarial conditions. The evaluation process is hypothesis-driven, starting with a stated hypothesis for each feature, followed by simulations and comparisons of results using predefined metrics and judges. This method helps catch subtle errors that might not be visible in individual traces but become apparent over longer interactions, thus preventing potential costly mistakes in live environments. Alejandro Aboy, a senior data and AI engineer, exemplifies this approach in his work with Workpath, leveraging Opik to maintain alignment in enterprise strategy execution and demonstrating how offline evaluations can be more cost-effective and insightful than always-on online evaluations.

Trends Found in this Post

No tracked trend matches for this post yet.