Advancements in object detection models, particularly within the YOLO series, continue to be significant, with a comparison of YOLOv8 and YOLOv9 highlighting their distinct performance characteristics. Both models were trained on the xView3 dataset, which contains aerial imagery for maritime object detection, to evaluate their robustness and generalization capabilities. While YOLOv8 demonstrates a higher true positive count, indicating superior recall, it also has a higher false positive count, suggesting a tendency for over-detection. Conversely, YOLOv9 is more conservative, with a lower false positive count but higher false negatives, potentially missing some object instances. A precision-recall curve analysis reveals that YOLOv8 generally performs better across different threshold values, capturing more true positives while effectively minimizing false positives. However, a comprehensive model evaluation should consider additional metrics like the F1 score and IOU distribution, as well as the impact of object dimensions and specific metric correlations on performance. These insights can guide improvements in model performance on platforms like Encord Active.