Vector Analysis with Scikit-learn and Bokeh
Blog post from Roboflow
Roboflow's dataset management and annotation solutions have introduced the ability to access multimodal CLIP embeddings through their API, enhancing functionalities like image similarity search, clustering, and anomaly detection. A tutorial demonstrates how to load dataset embeddings from Roboflow, analyze them using the t-SNE algorithm with Scikit-learn, and visualize the results with Bokeh. The process involves reducing high-dimensional CLIP vectors to two dimensions, which helps in identifying labeling errors and unexpected images by clustering similar images together. The visualization uses color-coded data points to represent different features such as object types, object count, and data splits, providing insights into dataset composition and potential areas for improvement. The tutorial encourages users to explore their datasets further and customize the provided script to discover additional insights, ultimately aiding in the refinement of machine learning models.