Company
Date Published
Author
Ornella Altunyan, Winnie Tam, Sophie Gao
Word count
1110
Language
English
Hacker News points
None

Summary

Coursera has built a structured evaluation process to quickly ship reliable AI features that customers love. They began adopting large language models to enhance their user experience, particularly with their Coursera Coach chatbot and AI-assisted grading tools, but realized the need for a better evaluation workflow. Before establishing a formal framework, they relied on fragmented offline jobs in spreadsheets and human labeling processes, which made it difficult to validate AI features and confidently push them to production. The business impact of these AI features is significant, with metrics demonstrating their value. The Coursera Coach serves as a 24/7 learning assistant and psychological support system for students, maintaining an impressive 90% learner satisfaction rating, while automated grading addresses a critical scaling challenge in Coursera's educational model. To evaluate AI features, Coursera uses a four-step approach: defining clear evaluation criteria upfront, curating targeted datasets, implementing both heuristic and model-based scorers, and running evaluations and iterating rapidly. Their structured evaluation framework has transformed their AI development process, increasing development confidence, moving ideas from concept to release faster, and enabling more comprehensive testing.