DOPE (Deep Object Pose Estimation) requires a pre-trained model. Input images may need to be cropped and resized to maintain the aspect ratio and match the input resolution of DOPE. After DOPE has produced an estimate, the DNN decoder will use the specified object type to transform using belief maps to output object poses.

NVLabs has provided a DOPE pre-trained model using the HOPE dataset. HOPE stands for Household Objects for Pose Estimation. HOPE is a research-oriented dataset using toy grocery objects and 3D textured meshes of the objects for training on synthetic data. To use DOPE for other objects that are relevant to your application, it needs to be trained with another dataset targeting these objects. For example, DOPE has been trained to detect dollies for use with a mobile robot that navigates under, lifts, and moves that type of dolly.

Repositories and Packages

The Isaac ROS implementations of this technology are available here: