In the quest to improve AI evaluation methods, this text highlights the transition from a rigid, checklist-based evaluation system to a more nuanced, human-centered approach using Eval Protocol (EP). The initial method, which relied on a simple checklist to judge AI-generated images, was found to be technically accurate but misaligned with human expectations. To address this, the evaluation was enhanced to focus on human-preference rubrics such as intent matching, content recognizability, spatial design, user experience, and visual coherence. This shift was embodied in a new evaluation framework that prioritized human-like judgment over mere technical compliance, resulting in more realistic and meaningful scores. The process also involved combining traditional checklist evaluations with human preference assessments to achieve a balanced score that better reflects real-world quality. The text concludes by advocating for codified, reproducible evaluation tests that align with user expectations, underlining the flexibility and speed of the Eval Protocol in adapting evaluation processes.