Introducing online evals in Pydantic Logfire
Blog post from Pydantic
Pydantic Logfire's online evaluations enable real-time scoring of AI agents on live production data, complementing offline evaluations that occur during development. By attaching evaluators to functions or agents, users can sample traffic and review results in the Logfire UI, allowing for continuous monitoring of metrics such as hallucination rates, tool-use accuracy, and response quality. These online evaluations utilize the same Evaluator classes as offline ones, ensuring consistency in scoring criteria and enabling immediate feedback on production performance. This approach helps identify regressions or improvements post-deployment and integrates seamlessly with OpenTelemetry for data flow across the same pipeline. Logfire's UI provides detailed insights with trend lines and event filtering, turning evaluations into a queryable surface rather than just a dashboard. Online evaluations do not replace human review but streamline the process, allowing human reviewers to focus on low-scoring traces and edge cases, thereby refining both agent performance and evaluator accuracy over time.