Introducing Align Evals: Streamlining LLM Application Evaluation

Post Details

Company

LangChain

Date Published

July 29, 2025

Author

-

Word Count

594

Language

English

Hacker News Points

-

Source URL

www.blog.langchain.com/introducing-align-evals

Summary

LangSmith has introduced Align Evals, a new feature designed to improve the alignment of evaluator scores with human preferences in application development, particularly when using language models as judges. Inspired by Eugene Yan's work, this feature allows LangSmith Cloud users, and soon LangSmith Self-Hosted users, to calibrate evaluators to better reflect human judgment through an interactive interface that facilitates prompt iteration and side-by-side comparisons of human and AI-generated scores. Align Evals addresses the challenge of evaluator score discrepancies by providing tools to identify unaligned cases, establish a baseline alignment score, and iteratively refine evaluator prompts to achieve better alignment. The feature enables developers to select evaluation criteria, create representative data sets for human review, assign expected scores, and test LLM evaluator prompts against these benchmarks. Future enhancements include performance analytics and automatic prompt optimization, aiding developers in building more effective evaluators.