Company
Date Published
Author
Frederik Hvilshøj
Word count
1379
Language
English
Hacker News points
None

Summary

Segment Anything Model 3 (SAM 3), a paper under review for ICLR 2026, represents a significant advancement in computer vision by expanding the capabilities of its predecessors through Promptable Concept Segmentation (PCS). Unlike SAM 1 and 2, which relied on geometric prompts, SAM 3 can segment all instances of a user-defined concept across images and video sequences, integrating image, video, and text into a unified architecture. This open-vocabulary model can identify concepts from noun phrases or visual exemplars, making it invaluable for applications in robotics, scientific imaging, and AI data pipelines. Its architecture combines a DETR-based object detector and a memory-based tracker to enhance detection and tracking capabilities. SAM 3 also introduces a new data engine combining human expertise with AI to improve annotation scalability. It outperforms previous benchmarks in open-vocabulary segmentation and tracking, though it faces challenges in fine-grained categories and long expressions. If released as open-source, SAM 3 could revolutionize vision-language segmentation, concept-level search, and interactive annotation, paving the way for more advanced AI infrastructure solutions.