As machine learning models become more complex and integrated into various applications, traditional evaluation metrics like Mean Average Precision often fall short in real-world deployment, necessitating a data-centric evaluation approach through model test cases. Encord proposes using these test cases, akin to unit tests in software engineering, to thoroughly assess model performance under specific scenarios, which helps identify potential weaknesses and optimize model accuracy both before and after deployment. This involves defining test cases with specific quality metrics, such as lighting conditions or object size, to evaluate granular performance and address failure modes through targeted data collection, relabeling, data augmentation, and synthetic data generation. Encord Active, an open-source toolkit, supports this approach by enabling the creation of custom quality metrics and automated test case evaluations, allowing users to gain deeper insights into model performance and prioritize improvement efforts, ultimately fostering better collaboration and development within the machine learning community.