LangSmith has enhanced its regression testing experience for AI applications, emphasizing the importance of evaluating large language model (LLM) applications to allow AI engineers to iterate confidently. Unlike traditional software testing, AI testing involves tracking performance over time and comparing individual datapoints between runs due to the inherent variability in AI model outputs. LangSmith's new infrastructure facilitates this by offering a Comparison View, which allows users to select and compare multiple experimental runs simultaneously, providing options to display varying levels of information such as text and latency. This infrastructure includes a feature for highlighting changes in evaluation metrics compared to a baseline run, enabling users to filter and focus on the most significant datapoints. The ability to manually inspect and compare data is emphasized as crucial for gaining insights and improving model performance, with LangSmith's tools designed to support this iterative exploration process.