Evals are just tests, so why aren’t engineers writing them?

Post Details

Company

Sentry

Date Published

July 29, 2025

Author

Eli Hooten

Word Count

1,248

Language

English

Hacker News Points

-

Source URL

blog.sentry.io/evals-are-just-tests-so-why-arent-engineers-writing-them

Summary

Evals, or evaluations, are tests designed to assess the performance of AI models, but they often exist separately from the main development workflow, leading to inefficiencies and integration challenges. Unlike traditional tests, evals provide complex metrics rather than simple pass/fail results, which complicates their interpretation and integration into existing testing frameworks. At Sentry, efforts have been made to address these issues by creating tools like "vitest-evals," which allow evals to function similarly to unit tests within CI/CD pipelines, enabling local execution, easy debugging, and standardized reporting formats such as JUnit XML. This integration aims to streamline evals into the regular development process, making AI quality assessment as seamless and reliable as code quality testing, thus enhancing development velocity and reducing organizational friction.