Trace-Level LLM Evaluations with Arize AX

Post Details

Company

Arize

Date Published

Aug. 20, 2025

Author

Sanjana Yeddula

Word Count

583

Language

English

Hacker News Points

-

Source URL

arize.com/blog/guide-to-trace-level-llm-evaluations-with-arize-ax

Summary

The text discusses the importance and methodology of trace-level evaluations for Large Language Model (LLM) applications, as opposed to the more common span-level evaluations. While span-level assessments focus on individual steps such as tool calls or LLM responses, trace-level evaluations provide a comprehensive view of the entire workflow, assessing the success, efficiency, and relevance of the final outcome. The tutorial highlights the use of Arize AX for conducting these evaluations and provides an example through a movie recommendation agent, which utilizes multiple tools to deliver a comprehensive answer to user queries. By evaluating the entire sequence of steps, trace-level evaluations help identify whether issues arise from specific components or the overall process. This approach is particularly valuable for multi-step workflows or multi-agent systems, ensuring end-to-end reliability and relevance.