Pose Estimation

In order to manipulate an object, a robot must be able to estimate the pose of the target relative to itself. Pose estimation processes input camera images to infer the 6-DoF pose of the target object in the scene.

Deep learning models have revolutionized pose estimation over classical techniques. DNN models with architectures such as DOPE (Deep Object Pose Estimation) and CenterPose from NVIDIA Research can be trained with monocular camera images of real-world objects to estimate the pose of the object in a novel scene with relatively high accuracy.