Segment Anything Model 3 (SAM 3): What to Expect from the Next Generation of Foundation Segmentation Models

Company

Encord

Date Published

Oct. 31, 2025

Author

Frederik Hvilshøj

Word count

1379

Language

English

Hacker News points

None

URL

encord.com/blog/segment-anything-model-3

Summary

Segment Anything Model 3 (SAM 3), a paper under review for ICLR 2026, represents a significant advancement in computer vision by expanding the capabilities of its predecessors through Promptable Concept Segmentation (PCS). Unlike SAM 1 and 2, which relied on geometric prompts, SAM 3 can segment all instances of a user-defined concept across images and video sequences, integrating image, video, and text into a unified architecture. This open-vocabulary model can identify concepts from noun phrases or visual exemplars, making it invaluable for applications in robotics, scientific imaging, and AI data pipelines. Its architecture combines a DETR-based object detector and a memory-based tracker to enhance detection and tracking capabilities. SAM 3 also introduces a new data engine combining human expertise with AI to improve annotation scalability. It outperforms previous benchmarks in open-vocabulary segmentation and tracking, though it faces challenges in fine-grained categories and long expressions. If released as open-source, SAM 3 could revolutionize vision-language segmentation, concept-level search, and interactive annotation, paving the way for more advanced AI infrastructure solutions.