Manipulation Workflow Configuration Guide#

This is a quick reference for configuring the manipulation workflow YAML files used with workflows.launch.py. For a complete parameters reference, refer to the Create Manipulator Configuration File section.

Note

Sample Configuration Files: You can find examples in the isaac_ros_manipulation_bringup package’s params directory. These sample files contain valid parameter combinations for different robot setups:

ur5e_robotiq_85_mac_and_cheese.yaml

Required Parameters#

Note

Editing Configuration Files:

Binary (Debian) Installation: Configuration files are read-only. Copy reference files before editing.
Source Installation: Edit in place or copy to a separate location.

Refer to Configure Your Workflow in the Pick and Place Tutorial for complete copy-and-edit instructions.

These parameters must be explicitly set in your configuration file:

Required Configuration Parameters#
Parameter	Options	Description
`workflow_type`	`PICK_AND_PLACE`, `POSE_TO_POSE`, `OBJECT_FOLLOWING`, `GEAR_ASSEMBLY`	Defines the manipulation workflow to execute
`camera_type`	`REALSENSE`, `ISAAC_SIM`	Selects camera driver and configuration

Configuration Example#

# Required Parameters (defaults shown)
workflow_type: PICK_AND_PLACE
camera_type: REALSENSE

Optional Parameters#

Perception workflow#

Configure the perception workflow, which provides depth estimation and object detection, segmentation, and pose estimation capabilities. Defaults are provided but can be customized:

Perception Workflow Parameters#
Parameter	Options	Default	Description	When to use
`depth_type`	`ESS_FULL`, `ESS_LIGHT`, `REALSENSE`, `FOUNDATION_STEREO_LOW_RES`, `FOUNDATION_STEREO_HIGH_RES`, `ISAAC_SIM`	`ESS_FULL`	Depth estimation method. Actual depth source varies by camera type and configuration. See Depth Configuration Compatibility for supported combinations. You must also set the corresponding engine file path — see Engine File Path Configuration below.	Default `ESS_FULL` gives real-time DNN depth. Drop to `ESS_LIGHT` on compute-constrained platforms where `ESS_FULL` saturates the system. Use `FOUNDATION_STEREO_LOW_RES` or `FOUNDATION_STEREO_HIGH_RES` for higher-quality depth on accuracy-sensitive tasks; both are slower (roughly 0.5 to 2 seconds per frame) and need more GPU memory (8 GB or 16 GB). Use `REALSENSE` for the fastest bring-up with the camera’s on-board stereo, with no DNN processing.
`object_detection_type`	`RTDETR`, `GROUNDING_DINO`, `DOPE`, `SEGMENT_ANYTHING`, `SEGMENT_ANYTHING2`	`RTDETR`	Object detection network for locating objects in the scene	`RTDETR` is the general-purpose default. Use `GROUNDING_DINO` when you want language-prompted detection (set the text prompt in the config file). Use `DOPE` for the DOPE training pipeline with 6-DOF output baked into the detector. Use `SEGMENT_ANYTHING` or `SEGMENT_ANYTHING2` for mask-driven detection paired with `FOUNDATION_POSE` (see the Supported Model Combinations table).
`segmentation_type`	`SEGMENT_ANYTHING`, `SEGMENT_ANYTHING2`, `NONE`	`NONE`	Segmentation network for extracting object masks	Leave at `NONE` for detection-only pipelines. Switch to `SEGMENT_ANYTHING` or `SEGMENT_ANYTHING2` when you need masks to feed `FOUNDATION_POSE` for fine-grained pose estimation.
`pose_estimation_type`	`FOUNDATION_POSE`, `DOPE`	`FOUNDATION_POSE`	6DOF pose estimation network for determining object poses	Default `FOUNDATION_POSE` is mesh-based 6-DOF and works across most pipelines. Set `DOPE` only when you’re using the `DOPE` detector (the two are paired).

Configuration Example#

# Perception Workflow (optional - defaults shown)
object_detection_type: RTDETR
segmentation_type: NONE
pose_estimation_type: FOUNDATION_POSE
depth_type: ESS_FULL

Engine File Path Configuration#

Each DNN-based depth type has a dedicated engine file path parameter in the configuration file. The workflow automatically selects the correct engine based on the depth_type you choose. You only need to verify that the paths are correct for your platform during initial setup.

Engine File Path Parameters#
`depth_type`	Configuration Parameter	Default Engine File
`ESS_FULL`	`ess_full_engine_file_path`	`ess.engine`
`ESS_LIGHT`	`ess_light_engine_file_path`	`light_ess.engine`
`FOUNDATION_STEREO_LOW_RES`	`foundationstereo_low_res_engine_file_path`	`foundationstereo_320x736.engine`
`FOUNDATION_STEREO_HIGH_RES`	`foundationstereo_high_res_engine_file_path`	`foundationstereo_576x960.engine`
`REALSENSE`	(none)	Uses native RealSense depth
`ISAAC_SIM`	(none)	Uses simulation depth

The sample configuration files include all engine paths pre-configured. To switch depth types, simply change the depth_type parameter:

# All engine paths are set once during initial setup.
# Only change the ``depth_type`` to switch between models.
depth_type: ESS_FULL

ess_full_engine_file_path: ${ISAAC_ROS_WS}/isaac_ros_assets/models/dnn_stereo_disparity/dnn_stereo_disparity_v4.1.0_onnx_trt10.13/ess.engine
ess_light_engine_file_path: ${ISAAC_ROS_WS}/isaac_ros_assets/models/dnn_stereo_disparity/dnn_stereo_disparity_v4.1.0_onnx_trt10.13/light_ess.engine
foundationstereo_low_res_engine_file_path: ${ISAAC_ROS_WS}/isaac_ros_assets/models/foundationstereo/deployable_foundation_stereo_small_v1.0/foundationstereo_320x736.engine
foundationstereo_high_res_engine_file_path: ${ISAAC_ROS_WS}/isaac_ros_assets/models/foundationstereo/deployable_foundation_stereo_small_v1.0/foundationstereo_576x960.engine

Note

Only the engine file for your chosen depth_type needs to be installed. The workflow validates the active engine path at launch and will report a clear error if the file is missing.

Important

FOUNDATION_STEREO_HIGH_RES requires 16 GB of GPU memory during TensorRT engine conversion, while FOUNDATION_STEREO_LOW_RES requires 8 GB. The model resolution is selected during model installation via the FOUNDATIONSTEREO_MODEL_RES environment variable.

Supported Model Combinations#

Not all model combinations are compatible. The following table lists valid combinations for the perception workflow:

Object Detection Type	Segmentation Type	Pose Estimation Type
`DOPE`	`NONE`	`DOPE`
`GROUNDING_DINO`	`NONE`	`FOUNDATION_POSE`
`GROUNDING_DINO`	`SEGMENT_ANYTHING`	`FOUNDATION_POSE`
`GROUNDING_DINO`	`SEGMENT_ANYTHING2`	`FOUNDATION_POSE`
`RTDETR`	`NONE`	`FOUNDATION_POSE`
`RTDETR`	`SEGMENT_ANYTHING`	`FOUNDATION_POSE`
`RTDETR`	`SEGMENT_ANYTHING2`	`FOUNDATION_POSE`
`SEGMENT_ANYTHING`	`SEGMENT_ANYTHING`	`FOUNDATION_POSE`
`SEGMENT_ANYTHING2`	`SEGMENT_ANYTHING2`	`FOUNDATION_POSE`

Depth Configuration Compatibility#

The actual depth source depends on your camera type and configuration. Each camera type supports different depth methods:

RealSense Cameras

Depth Type	`enable_dnn_depth_in_realsense`	Actual Depth Source
Any depth type	`false`	Native RealSense depth (overrides `depth_type`)
`ESS_FULL`	`true`	ESS Full DNN depth
`ESS_LIGHT`	`true`	ESS Light DNN depth
`FOUNDATION_STEREO_LOW_RES`	`true`	Foundation Stereo DNN depth (320x736)
`FOUNDATION_STEREO_HIGH_RES`	`true`	Foundation Stereo DNN depth (576x960)
`ISAAC_SIM`	Any value	Simulation depth

Isaac Sim Cameras

Supported Depth Type	Actual Depth Source
`ISAAC_SIM`	Simulation depth

Note

Key Configuration Rules:

RealSense cameras have native depth capability. Set enable_dnn_depth_in_realsense: false to use native depth, or true to use DNN-based depth estimation.
Isaac Sim cameras provide simulation depth directly and only support ISAAC_SIM depth type. The enable_dnn_depth_in_realsense parameter is ignored.
When using RealSense with enable_dnn_depth_in_realsense: false, the depth_type setting is overridden and native RealSense depth is used regardless.
DNN-based depth methods (ESS_FULL, ESS_LIGHT, FOUNDATION_STEREO_LOW_RES, FOUNDATION_STEREO_HIGH_RES) provide better quality but require additional computational resources.

Unsupported Combinations:

Isaac Sim cameras cannot use ESS_FULL, ESS_LIGHT, FOUNDATION_STEREO_LOW_RES, FOUNDATION_STEREO_HIGH_RES, or REALSENSE depth types

Note

Engine file paths. Each DNN-based depth type has a dedicated engine file parameter (ess_full_engine_file_path, ess_light_engine_file_path, foundationstereo_low_res_engine_file_path, foundationstereo_high_res_engine_file_path). The workflow automatically selects the correct engine based on depth_type. Verify that all engine paths are valid for your platform during initial setup. See Engine File Path Configuration for details.

Workspace Configuration#

Workspace Configuration Parameters#
Parameter	Default	Description
`moveit_collision_objects_scene_file`	None	File path to a `.scene` file defining static obstacles not visible to cameras for collision avoidance during motion planning

Configuration Example#

# Static Obstacles (for collision avoidance)
moveit_collision_objects_scene_file: /path/to/scene.scene

Creating a Static Planning Scene#

To create a .scene file for your workspace, use MoveIt’s RViz interface:

In the RViz window, click on Displays > MoveIt > Motion Planning. A new panel titled Motion Planning appears on the left side of the RViz window.
In the Scene Objects tab, add obstacles to represent your workspace. Supported geometric primitives are: box, sphere, and cylinder.
Export the scene file using the Export button, and set the path to the exported file in the moveit_collision_objects_scene_file parameter of your workflow configuration file.

After the scene file is configured, the static planning scene loads automatically at launch after cuMotion is ready for planning queries.

Warning

Include any workspace obstacles not visible to cameras to prevent collisions.

Advanced Parameters#

These parameters are only needed in specific scenarios:

Advanced Configuration Parameters#
Parameter	When Required	Default	Description
`time_sync_slop`	For robot segmenter synchronization	`0.1`	Synchronization threshold to sync joint states and depth images for robot segmenter to produce segmented depth. Small values prevent synchronization and large values cause segmented images that don’t accurately capture the robot’s state.
`enable_nvblox`	To reduce system load or disable collision avoidance	`true`	Enable/disable nvblox for 3D scene reconstruction and dynamic collision avoidance. Setting to `false` reduces system load and allows running without robot segmenter
`enable_dnn_depth_in_realsense`	For neural network-based depth estimation with RealSense cameras	`false`	Enable DNN-based depth estimation for RealSense cameras. See Depth Configuration Compatibility for camera-specific behavior and supported combinations

Configuration Example#

# Advanced Parameters
time_sync_slop: 0.1                   # Sync tolerance for joint states and depth images
enable_nvblox: false                  # Reduce system load
enable_dnn_depth_in_realsense: true   # Better depth quality for RealSense

Note

FoundationStereo Timing Consideration: FoundationStereo takes approximately 1.5 seconds per depth frame on Thor (FP32, 576x960); see the Comparing DNNs section. To synchronize depth images and joint states from the robot driver, use a time_sync_slop value close to 3 seconds for proper synchronization.

System Performance: Disabling enable_nvblox reduces computational load but eliminates dynamic collision avoidance and robot segmenter capabilities.

Usage#

ros2 launch isaac_ros_manipulation_bringup workflows.launch.py \
   manipulator_workflow_config:=/path/to/your/config.yaml

More Information#

isaac_ros_manipulation_bringup - Launch file parameter details