Improving Deep Agents with harness engineering
Blog post from LangChain
Harness engineering at LangChain plays a critical role in optimizing coding agents' performance by refining systems around a fixed model, specifically the gpt-5.2-codex, rather than altering the model itself. By focusing on elements like system prompts, tools, and middleware, the team improved their agent's score from 52.8% to 66.5% on the Terminal Bench 2.0 benchmark, which evaluates agentic coding across various tasks. Key strategies include employing automated trace analysis to identify and rectify errors, enhancing agents' self-verification capabilities, and optimizing context delivery. The approach underscores the importance of guiding agents to write testable code and managing computational resources efficiently. This iterative improvement process involves a combination of automated tools and human interventions to ensure agents do not become trapped in unproductive loops and can autonomously adjust their reasoning and problem-solving strategies. The exploration of multi-model systems and memory primitives for continual learning suggests promising future directions for harness engineering, with an emphasis on context engineering, self-verification, and adaptive reasoning as fundamental principles.