Company
Date Published
Author
-
Word count
594
Language
English
Hacker News points
None

Summary

LangSmith has introduced Align Evals, a new feature designed to improve the alignment of evaluator scores with human preferences in application development, particularly when using language models as judges. Inspired by Eugene Yan's work, this feature allows LangSmith Cloud users, and soon LangSmith Self-Hosted users, to calibrate evaluators to better reflect human judgment through an interactive interface that facilitates prompt iteration and side-by-side comparisons of human and AI-generated scores. Align Evals addresses the challenge of evaluator score discrepancies by providing tools to identify unaligned cases, establish a baseline alignment score, and iteratively refine evaluator prompts to achieve better alignment. The feature enables developers to select evaluation criteria, create representative data sets for human review, assign expected scores, and test LLM evaluator prompts against these benchmarks. Future enhancements include performance analytics and automatic prompt optimization, aiding developers in building more effective evaluators.