Segmentation refers to the process of labeling every point in an object with a semantic label. For images, this specifically refers to giving every pixel within the image a label. Intuitively, it can be thought of as providing a classification of an input image at the pixel level for a given set of known classes.

Image segmentation and object detection are closely linked. Image segmentation typically provides more information, but uses more compute compared to object detection due to the requirement of producing labeling every pixel. Object detection produces a bounding box in image coordinates. This is useful to know where and if an object exists spatially in a 2D image. Image segmentation is also used to determine what pixels belong to the class.

One use case for segmentation is to determine an object location in a 3D scene by fusing the segmentation mask produced by a segmentation network with the corresponding depth information.