Company
Date Published
Author
Ornella Altunyan
Word count
582
Language
English
Hacker News points
None

Summary

Bryan Cox and Ankur Goyal hosted a webinar titled "In the Loop: Technical Q&A" where they discussed effective eval best practices. Bryan emphasized keeping evals simple, starting with around 10 examples, and building a feedback loop to iterate quickly. They also introduced several common scoring functions teams start with, including Levenshtein distance, factuality, and closed QA. Additionally, they discussed Braintrust's approach to multi-step prompt chaining, integration with continuous integration, user feedback handling, multimodal data evaluation, balancing automated scoring with human review, Brainstore logging database, and the potential use of synthetic data in evals. The future of evals will involve automation of more tasks, aligning AI outputs with human expectations, and involving more team members beyond just engineers.