Introducing Agentic Evaluations

Post Details

Company

Galileo

Date Published

Jan. 23, 2025

Author

Quique Lores

Word Count

661

Language

English

Hacker News Points

-

Source URL

galileo.ai/blog/introducing-agentic-evaluations

Summary

Galileo has released Agentic Evaluations, a framework that empowers developers to rapidly deploy reliable and resilient agentic applications. This tool tackles the challenges of evaluating agents by providing agent-specific metrics, updated tracing, and granular cost and error tracking. Unlike traditional GenAI metrics, which focus on final responses, Agentic Evaluations examine the multiple steps involved in an agent's decision-making process, enabling developers to pinpoint areas for improvement and measure overall application health. The framework includes proprietary LLM-as-a-Judge metrics that have been tested and refined through research and customer learnings, and provides a visualization tool that groups entire traces and provides a single expandable view of individual nodes. By using Agentic Evaluations, developers can accelerate time-to-production of reliable and scalable agentic apps, and Galileo is excited to see where these tools are used next.