The article explores the development and evaluation of End-of-Turn (EoT) detection models for voice agents, focusing on improving the naturalness and responsiveness of interactions by accurately predicting when a user has finished speaking. The authors, dissatisfied with existing evaluation methods, created a new approach to assess the performance of various models, including their own, called Flux. They emphasize the importance of using full conversational evaluations over individual turns to more realistically simulate human-agent interactions. The article discusses challenges with timestamp accuracy and the benefits of using sequence alignment over temporal alignment to refine EoT detection metrics, resulting in improved precision and recall. Additionally, the authors address the detection of start-of-turn (SoT) events, crucial for handling user interruptions in voice agent pipelines, and propose using the start of the first word as a benchmark for SoT detection. They outline future directions for evaluating conversational metrics, highlighting the potential for incorporating semantics and conversation flow into performance assessments.