Home / Companies / LangChain / Blog / Post Details
Content Deep Dive

Evaluating Deep Agents: Our Learnings

Blog post from LangChain

Post Details
Company
Date Published
Author
-
Word Count
1,661
Language
English
Hacker News Points
-
Summary

LangChain recently developed and deployed four applications using their Deep Agents harness, including a coding agent, LangSmith Assist, a personal email assistant, and a no-code agent building platform. These applications necessitated the creation of evaluation patterns specific to Deep Agents, which involve bespoke test logic for each data point due to the unique success criteria for each instance. Evaluations can be conducted through single-step tests to validate immediate decision-making, full agent turns to assess the complete execution, and multi-turn tests to simulate extensive user interactions, each requiring a clean environment for reproducibility. LangSmith's integrations facilitate these evaluations by allowing for detailed assertions on agent behavior, including trajectory, final responses, and other generated state, while also offering tools to handle complex evaluation environments efficiently.