Isaac for Manipulation#

Isaac for Manipulation pick and place for a robot simulated in Isaac Sim

Isaac for Manipulation is a collection of GPU-accelerated packages for perception-driven manipulation, providing capabilities such as object detection, pose estimation, and time-optimal collision-free motion generation using cuMotion. These packages are distributed in individual repositories as part of Isaac ROS to maximize reuse and flexibility.

The Isaac Manipulator repository contains reference workflows that currently leverage the following Isaac packages:

Many deployments would also benefit from one or more of the following Isaac packages:

Reference Architecture#

The Isaac for Manipulation Reference Architecture explains the components used in Isaac for Manipulation at a high level.

Setup Guide#

Tutorials#

The tutorials detail options for implementing Isaac for Manipulation workflows with a specific example. You can develop similar things, but customized for your workflow.

Packages#

Application Notes and Limitations#

  • The reference workflows have been tested on Jetson AGX Thor with the following camera configurations:

    One or two RealSense D455 cameras

    Combinations of different depth cameras have not been tested.

  • The maximum number of cameras is constrained by both hardware limitations (specifically available USB 3 bandwidth or number of available GMSL ports) and by performance considerations. In particular, environment reconstruction using Nvblox has been tested with at most two cameras.

  • For workflows that involve object perception using RT-DETR, FoundationPose, or DOPE, only a single camera is used for that purpose. A second camera may be used together with the first for environment reconstruction using Nvblox.

  • The overhead associated with object perception is lower for the Pick and Place workflow than it is for object following. This is because the perception models are run on demand (using an action call) for the former, while they’re run for every input frame for the latter. For object following, the input frame rate is limited by a Drop node to reduce overhead and avoid work wasted on stale detections.

  • As is common for cameras such as the RealSense D455, the computed depth may be inaccurate for shiny surfaces. If the robot itself is reflective, inaccurate depth may result in spurious points in the point cloud that are not filtered out by the cuMotion robot segmentation node, which operates in three dimensions. These spurious points would then manifest as occupied voxels in the 3D reconstruction computed by nvblox, possibly causing planning failures or less-than-optimal motion plans.

    It is recommended that the depth image returned by the camera be visually inspected to ensure accuracy. If necessary, repositioning the camera or adjusting lighting in the environment can often improve depth quality. Filtering of the depth image, for example by a combination of erosion and dilation, can help ensure that poor depth samples are filtered by robot segmentation, albeit with some risk that points corresponding to true obstacles are filtered as well. Use of multiple cameras can reduce the likelihood that a poor depth sample results in an incorrectly occupied voxel in the reconstruction produced by Nvblox.

  • Intel RealSense depth cameras may lose connection and cause errors unless connected to a USB 3 port with a high-quality USB cable, especially if the cable length exceeds one meter.

  • Use of multiple RealSense depth cameras may demand more power than the USB 3 ports on Jetson AGX Thor can reliably provide, leading to instability. If this occurs, consider using a powered USB hub meeting the USB 3.2 Gen 1+ standard (for example, StarTech model HB30A7AME).

  • For increased reliability in two-camera configurations, we recommend the custom mesh or cuboid approaches to object attachment to reduce system load and thus improve reliability.

  • The system may throttle due to over-current especially when running with multiple cameras and big neural network models. This reduces performance of the system and can lead to jitters in the robotics pipeline and non deterministic behavior. Refer to Hardware Setup for more details. We recommend using a lighter system load for your software or disabling the over-current throttle to operate at peak performance.

Release Notes#

Date

Changes

2025-10-20

Added multi object pick-and-place and gear assembly workflow. Launch file unification of workflows, usage of non blocking CUDA streams through the pipeline, multi camera support with DNN stereo depth. Various other optimizations.

2025-02-01

Initial release (pose to pose, object following and pick-and-place workflows).