Machine learning (ML) engineers are cautioned against relying solely on accuracy to assess classification model performance due to the accuracy paradox, which can lead to misleading results, particularly with imbalanced datasets. In such datasets, the majority class is overrepresented, causing models to favor it and potentially neglect the minority class, which may be more significant in real-world applications. To address this, ML teams employ various strategies like collecting additional data, undersampling, oversampling, and adjusting the loss function to ensure balanced representation and mitigate bias. These methods, however, require careful implementation to avoid issues like overfitting. Evaluating model performance using diverse metrics such as precision, recall, and specificity is crucial to gauge true effectiveness, especially when models are deployed in real-world scenarios. Tools like Encord Active offer support by providing data and label quality metrics that help identify and rectify class imbalances, thereby improving model performance and reliability.