Test Run Comparisons

Post Details

Company

LangChain

Date Published

Oct. 17, 2023

Author

-

Word Count

644

Language

English

Hacker News Points

-

Source URL

www.blog.langchain.com/test-run-comparisons

Summary

LangChain's LangSmith platform introduces a feature called Test Run Comparisons to improve the evaluation of large language model (LLM) applications, addressing the challenge of quantitatively assessing changes to prompts, chains, or agents. This feature allows users to manually inspect and compare multiple test runs within a dataset, providing a user-friendly interface to view inputs, reference outputs, actual outputs, and evaluation metrics. Users can apply filters to focus on significant differences between test runs, aiding in the discovery of changes and enhancing understanding of the LLM's performance on specific tasks. By facilitating side-by-side comparisons and enabling deeper exploration of datapoints, LangSmith aims to build infrastructure that supports manual data inspection, aligning with the practices of successful AI researchers and engineers. Currently in private beta, LangSmith invites feedback as it plans to expand access and introduce more features.