The FoundationPose model is a pre-trained deep learning model developed by NVLabs. It estimates the 6DoF pose of an 3D object using the object’s image, depth map, detection bounding box and 3D CAD model without any need to retrain for different and novel objects. The output from the network is the estimated vector containing the pose and tracking information of the object.

Compared to other pose estimation models, the FoundationPose model provides significant improvement on zero-shot objects, without need to further fine tune the network. While FoundationPose pose estimation is compute intensive, inference can be run at a lower frequency. This can be complemented with FoundationPose tracking, which runs at over 120 FPS on Jetson Orin.

Repositories and Packages

The Isaac ROS implementation of this model is available here: