An Introduction to ImageNet
Blog post from Roboflow
ImageNet is a landmark dataset in computer vision, known for its size and semantic diversity, created by researchers from Princeton, Stanford, and UNC Chapel Hill. It was initially designed to populate the WordNet hierarchy with images for each concept, gathered through search engines and validated via Amazon Mechanical Turk. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is its most popular subset, containing over a million training images, 50,000 validation images, and 100,000 test images, spanning 1000 object classes. ImageNet serves as a crucial pretraining corpus and benchmarking foundation for advancements in image classification, with Google's Vision Transformer (ViT) currently holding the state-of-the-art record. The dataset is a pivotal resource for research and development in computer vision, and its subsets are accessible on platforms like Kaggle.