Home / Companies / Roboflow / Blog / Post Details
Content Deep Dive

What is YOLOS? What's New in the Model?

Blog post from Roboflow

Post Details
Company
Date Published
Author
Jacob Solawetz
Word Count
795
Language
English
Hacker News Points
-
Summary

YOLOS, an object detection model based on the Vision Transformer architecture, represents a significant innovation in computer vision, building on the transformer architecture initially successful in natural language processing. Unlike previous YOLO models that rely on convolutional neural networks for feature extraction, YOLOS utilizes a Transformer block, treating image patches as sequences akin to text tokens, marking a shift from traditional methods. Although YOLOS does not yet surpass traditional YOLO models in accuracy, with its best-performing variant achieving an Average Precision (AP) score of 42.0 on the COCO dataset compared to higher scores from models like YOLOv7, it is viewed as a pioneering effort to explore the application of transformers in computer vision. The model's development is geared more towards research than immediate state-of-the-art performance, suggesting its potential for future advancements in the field.