Better Harness: A Recipe for Harness Hill-Climbing with Evals

Post Details

Company

LangChain

Date Published

April 8, 2026

Author

-

Word Count

2,059

Language

English

Hacker News Points

-

Source URL

www.langchain.com/blog/better-harness-a-recipe-for-harness-hill-climbing-with-evals

Summary

Better-Harness is a system designed to improve AI agents through a process of iteratively refining harnesses using evaluations (evals) as a learning signal, similar to training data in machine learning. The approach emphasizes the importance of high-quality evals, sourced from hand-curated examples, production traces, and external datasets, to guide agents towards desired behaviors and prevent overfitting. The system employs a cycle of data sourcing, experiment design, optimization, and review, with evals categorized by behavioral tags to enable targeted experiments and holdout sets to ensure generalization. By integrating human review and trace analysis, Better-Harness aims to enhance agent performance by discovering and addressing failure modes while maintaining a focus on generalization and avoiding regressions. The results from testing this system with models like Claude Sonnet 4.6 and Z.ai’s GLM-5 show improved agent behavior, demonstrating the potential for this approach to autonomously refine agent harnesses and adapt to various domains.