Introduction to Semantic Segmentation

Company

Encord

Date Published

July 14, 2023

Author

Nikolaj Buhl

Word count

2033

Language

English

Hacker News points

None

URL

encord.com/blog/guide-to-semantic-segmentation

Summary

Semantic segmentation is a type of computer vision algorithm that aims to extract vital information from images and videos by providing granular information about various entities in an image. This task involves training models to produce segmentation masks for the recognition and localization of different entities present in images. Semantic segmentation is closely related to object detection, but it focuses on pixel-level identification instead of drawing bounding boxes. The task has three sub-categories: instance segmentation, semantic segmentation, and panoptic segmentation. Instance segmentation identifies discrete items like cars and people, while semantic segmentation classifies all pixels to a single cluster without regard for independent entities. Panoptic segmentation combines the two algorithms to present a unified picture of discrete objects and background entities. Semantic segmentation models borrow from image classification models and improve upon them by labeling each pixel to a pre-defined class, resulting in accurate object classification and localization. The task has various applications across industries, including medical imaging, autonomous vehicles, agriculture, and image manipulation. However, it also has drawbacks, such as its inability to distinguish between different occurrences of the same object, which can be addressed using panoptic segmentation. Popular architectures for semantic segmentation include Fully Convolutional Networks (FCN), DeepLab, and U-Net, each with variations that improve upon the original architecture.