Home / Companies / LaunchDarkly / Blog / Post Details
Content Deep Dive

Online evals in AI Configs is now GA

Blog post from LaunchDarkly

Post Details
Company
Date Published
Author
Kelvin Yap
Word Count
660
Language
English
Hacker News Points
-
Summary

Online evaluations, now generally available in AI Configs, offer a method to automatically assess AI output quality using large language models (LLMs) as judges, with the addition of customizable judges that allow teams to define their own criteria for what constitutes "good" output. This flexibility enables teams to tailor evaluations to their specific needs, ensuring that AI behavior aligns with the intended experience and policy boundaries of their industry, brand, or workflow. For instance, a banking chatbot must maintain a professional tone to build trust, avoiding casual language that, while accurate, could undermine user confidence. Custom judges allow teams to score outputs based on these nuanced requirements, providing actionable insights during rollouts, enabling them to pause or revert changes if necessary. These scores become valuable tools during releases, complementing existing metrics like latency and cost, and are managed through the same workflow as other AI Configurations, allowing for continuous iteration and refinement in evaluation criteria.