What is YOLOS? What's New in the Model?
Blog post from Roboflow
YOLOS, an object detection model based on the Vision Transformer architecture, represents a significant innovation in computer vision, building on the transformer architecture initially successful in natural language processing. Unlike previous YOLO models that rely on convolutional neural networks for feature extraction, YOLOS utilizes a Transformer block, treating image patches as sequences akin to text tokens, marking a shift from traditional methods. Although YOLOS does not yet surpass traditional YOLO models in accuracy, with its best-performing variant achieving an Average Precision (AP) score of 42.0 on the COCO dataset compared to higher scores from models like YOLOv7, it is viewed as a pioneering effort to explore the application of transformers in computer vision. The model's development is geared more towards research than immediate state-of-the-art performance, suggesting its potential for future advancements in the field.