Home / Companies / Sentry / Blog / Post Details
Content Deep Dive

Evals are just tests, so why aren’t engineers writing them?

Blog post from Sentry

Post Details
Company
Date Published
Author
Eli Hooten
Word Count
1,248
Language
English
Hacker News Points
-
Summary

Evals, or evaluations, are tests designed to assess the performance of AI models, but they often exist separately from the main development workflow, leading to inefficiencies and integration challenges. Unlike traditional tests, evals provide complex metrics rather than simple pass/fail results, which complicates their interpretation and integration into existing testing frameworks. At Sentry, efforts have been made to address these issues by creating tools like "vitest-evals," which allow evals to function similarly to unit tests within CI/CD pipelines, enabling local execution, easy debugging, and standardized reporting formats such as JUnit XML. This integration aims to streamline evals into the regular development process, making AI quality assessment as seamless and reliable as code quality testing, thus enhancing development velocity and reducing organizational friction.