LangChain recently developed and deployed four applications using their Deep Agents harness, including a coding agent, LangSmith Assist, a personal email assistant, and a no-code agent building platform. These applications necessitated the creation of evaluation patterns specific to Deep Agents, which involve bespoke test logic for each data point due to the unique success criteria for each instance. Evaluations can be conducted through single-step tests to validate immediate decision-making, full agent turns to assess the complete execution, and multi-turn tests to simulate extensive user interactions, each requiring a clean environment for reproducibility. LangSmith's integrations facilitate these evaluations by allowing for detailed assertions on agent behavior, including trajectory, final responses, and other generated state, while also offering tools to handle complex evaluation environments efficiently.