Home / Companies / Encord / Blog / Post Details
Content Deep Dive

Segment Anything Model 3 (SAM 3): What to Expect from the Next Generation of Foundation Segmentation Models

Blog post from Encord

Post Details
Company
Date Published
Author
Frederik Hvilshøj
Word Count
1,379
Language
English
Hacker News Points
-
Summary

Segment Anything Model 3 (SAM 3), a paper under review for ICLR 2026, represents a significant advancement in computer vision by expanding the capabilities of its predecessors through Promptable Concept Segmentation (PCS). Unlike SAM 1 and 2, which relied on geometric prompts, SAM 3 can segment all instances of a user-defined concept across images and video sequences, integrating image, video, and text into a unified architecture. This open-vocabulary model can identify concepts from noun phrases or visual exemplars, making it invaluable for applications in robotics, scientific imaging, and AI data pipelines. Its architecture combines a DETR-based object detector and a memory-based tracker to enhance detection and tracking capabilities. SAM 3 also introduces a new data engine combining human expertise with AI to improve annotation scalability. It outperforms previous benchmarks in open-vocabulary segmentation and tracking, though it faces challenges in fine-grained categories and long expressions. If released as open-source, SAM 3 could revolutionize vision-language segmentation, concept-level search, and interactive annotation, paving the way for more advanced AI infrastructure solutions.