ESS

ESS stands for Efficient Semi-Supervised stereo disparity and was developed by NVIDIA. The ESS DNN is used to predict the disparity for each pixel from stereo camera image pairs. This network has improvements over classic CV approaches that use epipolar geometry to compute disparity, because the DNN can learn to predict disparity in cases where epipolar geometry feature matching fails. The semi-supervised learning and stereo disparity matching makes the ESS DNN robust in environments unseen in the training datasets and with occluded objects. This DNN is optimized for and evaluated with color (RGB) global shutter stereo camera images and accuracy can vary for monochrome stereo images used in analytic computer vision approaches to stereo disparity.

The predicted disparity values represent the distance a point moves from one image to the other in a stereo image pair (a.k.a. the binocular image pair). The disparity is inversely proportional to the depth, that is, disparity = focalLength x baseline / depth. Given the focal length and baseline of the camera that generates a stereo image pair, the predicted disparity map from the isaac_ros_ess package can be used to compute depth and generate a point cloud.

Repositories and Packages

The Isaac ROS implementations of this technology are available here:

Isaac ROS Depth Estimation using ESS