Session-Level Evaluations with Arize AX

Post Details

Company

Arize

Date Published

Aug. 19, 2025

Author

Sanjana Yeddula

Word Count

563

Language

English

Hacker News Points

-

Source URL

arize.com/blog/session-level-evaluations-with-arize-ax

Summary

Session-level evaluations offer a comprehensive approach to assessing AI applications by focusing on multi-turn interactions rather than isolated tool calls or individual model responses, providing a holistic view of the user experience. Using the Arize Python SDK, developers can implement these evaluations by grouping traces into sessions via session IDs, which represent entire conversations, such as those between a user and a chatbot. This method allows for the analysis of session correctness, frustration, and goal achievement, offering insights into whether the AI effectively assisted the user, maintained accuracy, and prevented dissatisfaction. The process involves setting up code to attach session or user IDs to spans, preparing data for evaluation with Arize AX’s Export Client, and running evaluations using LLM-as-a-judge templates. Results can be logged back to Arize for visualization and further analysis, enabling developers to explore unsuccessful sessions, identify user frustration, and assess model performance across multiple interactions, thereby enhancing the AI system’s overall efficacy.