A New Framework for Evaluating Voice Agents (EVA)

Post Details

Company

Hugging Face

Date Published

March 24, 2026

Author

Tara Bogavelli, Gabrielle Gauthier Melancon, Katrina Stankiewicz, Nifemi Bamgbose, Hoang Nguyen, Raghav Mehndiratta, Hari Subramani, and Fanny Riols

Word Count

2,147

Company Posts That Month

63

Language

-

Hacker News Points

-

Post removed?

No

Source URL

huggingface.co/blog/ServiceNow-AI/eva

Summary

EVA is a comprehensive framework designed to evaluate conversational voice agents by examining both task accuracy and user experience in multi-turn spoken interactions. Unlike existing models that treat accuracy and conversational experience as separate entities, EVA integrates these dimensions, providing two primary scores: EVA-A for accuracy and EVA-X for experience. This framework uses a bot-to-bot audio architecture to simulate realistic conversations and evaluates agents with a suite of metrics, including deterministic code-based and LLM-as-Judge methods. EVA's findings reveal a consistent tradeoff between task completion and user experience, highlighting the need for a holistic approach to voice agent evaluation. It also identifies common failure modes, such as named entity transcription errors and complexities in multi-step workflows. Currently released with a dataset of airline scenarios, EVA plans to expand to diverse domains and conditions, aiming to enhance voice agent capabilities while addressing inherent limitations like biases in LLM-as-Judge models and domain-specific constraints.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Voice AI	19	2,447	202	43	+13%
LLM	9	6,078	960	218	+18%
Real-time	2	6,457	1,307	242	+28%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.