Catch what tests miss: Online evaluations for AI capabilities

Post Details

Company

Axiom

Date Published

Feb. 27, 2026

Author

-

Word Count

1,605

Language

English

Hacker News Points

-

Source URL

axiom.co/blog/online-evaluations

Summary

Online evaluations provide a method for continuous quality monitoring of AI capabilities by scoring production outputs in real time without requiring ground truth, using the same Scorer API as offline evaluations. This approach allows for fire-and-forget execution, where scorers operate in the background, thereby not blocking user responses and using per-scorer sampling to manage costs effectively. By linking evaluation spans to the originating generation span via OpenTelemetry, it becomes possible to trace back to the source of any low scores, closing the loop between production failures and offline test cases. Online evaluations help fill the gaps left by offline evaluations and user feedback by attaching scoring functions to live production traffic, offering continuous visibility into AI performance, and enabling both structural and semantic checks to ensure robust quality monitoring. The results are integrated with the Axiom Console, allowing for detailed analysis and iteration based on production data, thereby enhancing test coverage and reducing the time to actionable insights.