The CNN model was trained on the dataset created from labels predicted by the CLIP model. The model achieved a train accuracy of 0.73 and a val accuracy of 0.27, indicating that it performed well on the training data but poorly on the validation data. The model's performance was evaluated using Encord Active, which provided insights into its strengths and weaknesses. The evaluation results showed that the model was overfitting to certain classes and needed improvement in terms of image-level annotation quality and brightness. The study highlighted the importance of using a robust evaluation framework like Encord Active for computer vision model testing and validation.