The article explores the utility of Comet’s confusion matrix in evaluating the performance of classification models, particularly in dealing with imbalanced datasets. It uses two examples to demonstrate this: a fraud detection model using a highly imbalanced credit card transaction dataset and a CIFAR100 dataset with a simple CNN model for classification on unstructured data. The confusion matrix offers a more nuanced view of model performance than accuracy alone, revealing misclassifications and aiding in model debugging. The article highlights the misleading nature of high accuracy in imbalanced datasets and shows how confusion matrices can provide a granular understanding of model performance across classes, with Comet’s tool facilitating easy visualization and analysis of misclassified instances. The piece provides links to Comet experiments and Colab Notebooks for readers to explore and replicate the experiments discussed.