Manipulation Workflow Configuration Guide#

This is a quick reference for configuring the manipulation workflow YAML files used with workflows.launch.py. For a complete parameters reference, refer to the Create Manipulator Configuration File section.

Note

Sample Configuration Files: You can find examples in the isaac_ros_manipulation_bringup package’s params directory. These sample files contain valid parameter combinations for different robot setups:

Required Parameters#

Note

Editing Configuration Files:

  • Binary (Debian) Installation: Configuration files are read-only. Copy reference files before editing.

  • Source Installation: Edit in place or copy to a separate location.

Refer to Configure Your Workflow in the Pick and Place Tutorial for complete copy-and-edit instructions.

These parameters must be explicitly set in your configuration file:

Required Configuration Parameters#

Parameter

Options

Description

workflow_type

PICK_AND_PLACE, POSE_TO_POSE, OBJECT_FOLLOWING, GEAR_ASSEMBLY

Defines the manipulation workflow to execute

camera_type

REALSENSE, ISAAC_SIM

Selects camera driver and configuration

Configuration Example#

# Required Parameters (defaults shown)
workflow_type: PICK_AND_PLACE
camera_type: REALSENSE

Optional Parameters#

Perception workflow#

Configure the perception workflow, which provides depth estimation and object detection, segmentation, and pose estimation capabilities. Defaults are provided but can be customized:

Perception Workflow Parameters#

Parameter

Options

Default

Description

When to use

depth_type

ESS_FULL, ESS_LIGHT, REALSENSE, FOUNDATION_STEREO_LOW_RES, FOUNDATION_STEREO_HIGH_RES, ISAAC_SIM

ESS_FULL

Depth estimation method. Actual depth source varies by camera type and configuration. See Depth Configuration Compatibility for supported combinations. You must also set the corresponding engine file path — see Engine File Path Configuration below.

Default ESS_FULL gives real-time DNN depth. Drop to ESS_LIGHT on compute-constrained platforms where ESS_FULL saturates the system. Use FOUNDATION_STEREO_LOW_RES or FOUNDATION_STEREO_HIGH_RES for higher-quality depth on accuracy-sensitive tasks; both are slower (roughly 0.5 to 2 seconds per frame) and need more GPU memory (8 GB or 16 GB). Use REALSENSE for the fastest bring-up with the camera’s on-board stereo, with no DNN processing.

object_detection_type

RTDETR, GROUNDING_DINO, DOPE, SEGMENT_ANYTHING, SEGMENT_ANYTHING2

RTDETR

Object detection network for locating objects in the scene

RTDETR is the general-purpose default. Use GROUNDING_DINO when you want language-prompted detection (set the text prompt in the config file). Use DOPE for the DOPE training pipeline with 6-DOF output baked into the detector. Use SEGMENT_ANYTHING or SEGMENT_ANYTHING2 for mask-driven detection paired with FOUNDATION_POSE (see the Supported Model Combinations table).

segmentation_type

SEGMENT_ANYTHING, SEGMENT_ANYTHING2, NONE

NONE

Segmentation network for extracting object masks

Leave at NONE for detection-only pipelines. Switch to SEGMENT_ANYTHING or SEGMENT_ANYTHING2 when you need masks to feed FOUNDATION_POSE for fine-grained pose estimation.

pose_estimation_type

FOUNDATION_POSE, DOPE

FOUNDATION_POSE

6DOF pose estimation network for determining object poses

Default FOUNDATION_POSE is mesh-based 6-DOF and works across most pipelines. Set DOPE only when you’re using the DOPE detector (the two are paired).

Configuration Example#

# Perception Workflow (optional - defaults shown)
object_detection_type: RTDETR
segmentation_type: NONE
pose_estimation_type: FOUNDATION_POSE
depth_type: ESS_FULL

Engine File Path Configuration#

Each DNN-based depth type has a dedicated engine file path parameter in the configuration file. The workflow automatically selects the correct engine based on the depth_type you choose. You only need to verify that the paths are correct for your platform during initial setup.

Engine File Path Parameters#

depth_type

Configuration Parameter

Default Engine File

ESS_FULL

ess_full_engine_file_path

ess.engine

ESS_LIGHT

ess_light_engine_file_path

light_ess.engine

FOUNDATION_STEREO_LOW_RES

foundationstereo_low_res_engine_file_path

foundationstereo_320x736.engine

FOUNDATION_STEREO_HIGH_RES

foundationstereo_high_res_engine_file_path

foundationstereo_576x960.engine

REALSENSE

(none)

Uses native RealSense depth

ISAAC_SIM

(none)

Uses simulation depth

The sample configuration files include all engine paths pre-configured. To switch depth types, simply change the depth_type parameter:

# All engine paths are set once during initial setup.
# Only change the ``depth_type`` to switch between models.
depth_type: ESS_FULL

ess_full_engine_file_path: ${ISAAC_ROS_WS}/isaac_ros_assets/models/dnn_stereo_disparity/dnn_stereo_disparity_v4.1.0_onnx_trt10.13/ess.engine
ess_light_engine_file_path: ${ISAAC_ROS_WS}/isaac_ros_assets/models/dnn_stereo_disparity/dnn_stereo_disparity_v4.1.0_onnx_trt10.13/light_ess.engine
foundationstereo_low_res_engine_file_path: ${ISAAC_ROS_WS}/isaac_ros_assets/models/foundationstereo/deployable_foundation_stereo_small_v1.0/foundationstereo_320x736.engine
foundationstereo_high_res_engine_file_path: ${ISAAC_ROS_WS}/isaac_ros_assets/models/foundationstereo/deployable_foundation_stereo_small_v1.0/foundationstereo_576x960.engine

Note

Only the engine file for your chosen depth_type needs to be installed. The workflow validates the active engine path at launch and will report a clear error if the file is missing.

Important

FOUNDATION_STEREO_HIGH_RES requires 16 GB of GPU memory during TensorRT engine conversion, while FOUNDATION_STEREO_LOW_RES requires 8 GB. The model resolution is selected during model installation via the FOUNDATIONSTEREO_MODEL_RES environment variable.

Supported Model Combinations#

Not all model combinations are compatible. The following table lists valid combinations for the perception workflow:

Object Detection Type

Segmentation Type

Pose Estimation Type

DOPE

NONE

DOPE

GROUNDING_DINO

NONE

FOUNDATION_POSE

GROUNDING_DINO

SEGMENT_ANYTHING

FOUNDATION_POSE

GROUNDING_DINO

SEGMENT_ANYTHING2

FOUNDATION_POSE

RTDETR

NONE

FOUNDATION_POSE

RTDETR

SEGMENT_ANYTHING

FOUNDATION_POSE

RTDETR

SEGMENT_ANYTHING2

FOUNDATION_POSE

SEGMENT_ANYTHING

SEGMENT_ANYTHING

FOUNDATION_POSE

SEGMENT_ANYTHING2

SEGMENT_ANYTHING2

FOUNDATION_POSE

Depth Configuration Compatibility#

The actual depth source depends on your camera type and configuration. Each camera type supports different depth methods:

RealSense Cameras

Depth Type

enable_dnn_depth_in_realsense

Actual Depth Source

Any depth type

false

Native RealSense depth (overrides depth_type)

ESS_FULL

true

ESS Full DNN depth

ESS_LIGHT

true

ESS Light DNN depth

FOUNDATION_STEREO_LOW_RES

true

Foundation Stereo DNN depth (320x736)

FOUNDATION_STEREO_HIGH_RES

true

Foundation Stereo DNN depth (576x960)

ISAAC_SIM

Any value

Simulation depth

Isaac Sim Cameras

Supported Depth Type

Actual Depth Source

ISAAC_SIM

Simulation depth

Note

Key Configuration Rules:

  • RealSense cameras have native depth capability. Set enable_dnn_depth_in_realsense: false to use native depth, or true to use DNN-based depth estimation.

  • Isaac Sim cameras provide simulation depth directly and only support ISAAC_SIM depth type. The enable_dnn_depth_in_realsense parameter is ignored.

  • When using RealSense with enable_dnn_depth_in_realsense: false, the depth_type setting is overridden and native RealSense depth is used regardless.

  • DNN-based depth methods (ESS_FULL, ESS_LIGHT, FOUNDATION_STEREO_LOW_RES, FOUNDATION_STEREO_HIGH_RES) provide better quality but require additional computational resources.

Unsupported Combinations:

  • Isaac Sim cameras cannot use ESS_FULL, ESS_LIGHT, FOUNDATION_STEREO_LOW_RES, FOUNDATION_STEREO_HIGH_RES, or REALSENSE depth types

Note

Engine file paths. Each DNN-based depth type has a dedicated engine file parameter (ess_full_engine_file_path, ess_light_engine_file_path, foundationstereo_low_res_engine_file_path, foundationstereo_high_res_engine_file_path). The workflow automatically selects the correct engine based on depth_type. Verify that all engine paths are valid for your platform during initial setup. See Engine File Path Configuration for details.

Workspace Configuration#

Workspace Configuration Parameters#

Parameter

Default

Description

moveit_collision_objects_scene_file

None

File path to a .scene file defining static obstacles not visible to cameras for collision avoidance during motion planning

Configuration Example#

# Static Obstacles (for collision avoidance)
moveit_collision_objects_scene_file: /path/to/scene.scene

Creating a Static Planning Scene#

To create a .scene file for your workspace, use MoveIt’s RViz interface:

  1. In the RViz window, click on Displays > MoveIt > Motion Planning. A new panel titled Motion Planning appears on the left side of the RViz window.

    https://media.githubusercontent.com/media/NVIDIA-ISAAC-ROS/.github/release-4.4/resources/isaac_ros_docs/reference_workflows/isaac_for_manipulation/manipulator_rviz_display_motion_planning.png/
  2. In the Scene Objects tab, add obstacles to represent your workspace. Supported geometric primitives are: box, sphere, and cylinder.

    https://media.githubusercontent.com/media/NVIDIA-ISAAC-ROS/.github/release-4.4/resources/isaac_ros_docs/reference_workflows/isaac_for_manipulation/manipulator_scene_objects.png/
  3. Export the scene file using the Export button, and set the path to the exported file in the moveit_collision_objects_scene_file parameter of your workflow configuration file.

    After the scene file is configured, the static planning scene loads automatically at launch after cuMotion is ready for planning queries.

    https://media.githubusercontent.com/media/NVIDIA-ISAAC-ROS/.github/release-4.4/resources/isaac_ros_docs/reference_workflows/isaac_for_manipulation/manipulator_static_planning_scene.png/

Warning

Include any workspace obstacles not visible to cameras to prevent collisions.

Advanced Parameters#

These parameters are only needed in specific scenarios:

Advanced Configuration Parameters#

Parameter

When Required

Default

Description

time_sync_slop

For robot segmenter synchronization

0.1

Synchronization threshold to sync joint states and depth images for robot segmenter to produce segmented depth. Small values prevent synchronization and large values cause segmented images that don’t accurately capture the robot’s state.

enable_nvblox

To reduce system load or disable collision avoidance

true

Enable/disable nvblox for 3D scene reconstruction and dynamic collision avoidance. Setting to false reduces system load and allows running without robot segmenter

enable_dnn_depth_in_realsense

For neural network-based depth estimation with RealSense cameras

false

Enable DNN-based depth estimation for RealSense cameras. See Depth Configuration Compatibility for camera-specific behavior and supported combinations

Configuration Example#

# Advanced Parameters
time_sync_slop: 0.1                   # Sync tolerance for joint states and depth images
enable_nvblox: false                  # Reduce system load
enable_dnn_depth_in_realsense: true   # Better depth quality for RealSense

Note

FoundationStereo Timing Consideration: FoundationStereo takes approximately 1.5 seconds per depth frame on Thor (FP32, 576x960); see the Comparing DNNs section. To synchronize depth images and joint states from the robot driver, use a time_sync_slop value close to 3 seconds for proper synchronization.

System Performance: Disabling enable_nvblox reduces computational load but eliminates dynamic collision avoidance and robot segmenter capabilities.

Usage#

ros2 launch isaac_ros_manipulation_bringup workflows.launch.py \
   manipulator_workflow_config:=/path/to/your/config.yaml

More Information#