Benchmarking

Robots are real-time systems that require complex graphs of heterogeneous computation to perform perception, planning, and control. These graphs of computation need to perform work deterministically and with known latency. The computing platform has a fixed budget for heterogeneous computation (TOPS) and throughput. Computation is typically performed on multiple CPUs, GPUs, and additional special purpose, fixed function hardware accelerators.

ros2_benchmark provides the tools for measuring the throughput, latency, and compute utilization of these complex graphs without altering the code under test. The results can be used to make informed design decisions on how a robotics application can best meet its real-time requirements.

The following describes how to configure benchmarking for Isaac ROS Benchmark, which builds on ros2_benchmark for Isaac ROS graphs.

Benchmark Configurations

Performance measurement for graphs of nodes requires:

  • a ROS 2 launch file to launch the benchmark

  • an input YAML file specifying the configuration information

Each Isaac ROS Benchmark launch file details its own benchmark configuration via a comment near the top of the file. For example, the following comment is included immediately following the license header in isaac_ros_benchmark/scripts/isaac_ros_apriltag_node.py:

"""
Performance test for Isaac ROS AprilTagNode.

The graph consists of the following:
- Preprocessors:
    1. PrepResizeNode: resizes images to HD
- Graph under Test:
    1. AprilTagNode: detects AprilTags

Required:
- Packages:
    - isaac_ros_image_proc
    - isaac_ros_apriltag
- Datasets:
    - assets/datasets/r2b_dataset/r2b_storage
"""

Each section of this comment is explained in further detail below.

Preprocessors

In some cases, the desired input sequence contains data that is not yet in the appropriate format to be received by the ROS 2 graph under test. For example, in the case of isaac_ros_apriltag_node.py, the input dataset’s images must be resized into HD resolution before being passed into the AprilTag detecting node.

The preprocessing nodes allow for these types of data transformations to be executed before the critical timing section of the benchmark begins, ensuring that there is no undesired penalty in performance.

Graph Under Test

The graph under test refers to the core selection of ROS 2 nodes whose performance is to be measured in this specific benchmark. For example, in the case of isaac_ros_apriltag_node.py, only the Isaac ROS AprilTag detecting node is included under the graph under test.

By contrast, the isaac_ros_apriltag_graph.py benchmark includes multiple nodes in its graph under test:

"""
[...]
- Graph under Test:
    1. RectifyNode: rectifies images
    2. AprilTagNode: detects AprilTags
[...]
"""

The Isaac ROS Benchmark collection of benchmark scripts includes both individual node and composite graphs under test. Node-specific benchmarks, identified by the _node suffix, showcase the absolute maximum performance possible when a node is run in isolation. Larger graph benchmarks, identified by the _graph suffix, present performance in a more typical use case.

Required Packages

Each benchmark in the Isaac ROS Benchmark collection includes a different selection of nodes as preprocessors or components of the graph under test. Consequently, each benchmark requires its own specific subset of the Isaac ROS suite of packages in order to successfully run.

For example, the isaac_ros_apriltag_graph.py benchmark directly depends on the isaac_ros_image_proc and isaac_ros_apriltag packages. These packages, along with their own recursive dependencies, must be properly built and sourced prior to running the benchmark.

Required Datasets

The Isaac ROS Benchmark scripts use the standard r2b Dataset 2023 collection of input data. Before running any benchmarks, the input datasets must be downloaded by following the instructions in Readme Datasets.

Required Models

Some of the benchmark graphs require loading model files. Models used by a benchmark graph are listed in the benchmark script’s header. By default models are expected to be accessible by a benchmark script under ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models.

Before benchmarking a node that contains a DNN, the DNN must be downloaded and converted to a .plan file for the host system. You can do this using the instructions provided in the table below:

Model

Command

Bi3D

mkdir -p ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/bi3d && \
  cd ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/bi3d && \
  wget 'https://api.ngc.nvidia.com/v2/models/nvidia/isaac/bi3d_proximity_segmentation/versions/2.0.0/files/featnet.onnx' && \
  wget 'https://api.ngc.nvidia.com/v2/models/nvidia/isaac/bi3d_proximity_segmentation/versions/2.0.0/files/segnet.onnx'

ESS

mkdir -p ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/ess && \
  cd ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/ess && \
  wget 'https://api.ngc.nvidia.com/v2/models/nvidia/isaac/dnn_stereo_disparity/versions/3.0.0/files/ess.etlt'

DOPE Ketchup

mkdir -p ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/ketchup && cd ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/ketchupDownload Ketchup.pth model from here to the current directory. Start the Isaac ROS Docker container before running the next step: ${ISAAC_ROS_WS}/scripts/run_dev.sh && python3 /workspaces/isaac_ros-dev/src/isaac_ros_pose_estimation/isaac_ros_dope/scripts/dope_converter.py --format onnx --input Ketchup.pth --output ketchup.onnxCreate a file config.pbtxt in the current directory by using the configurations provided in setup 4 here

PeopleNet

mkdir -p ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/peoplenet && \
  cd ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/peoplenet && \
  wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/pruned_quantized_v2.3.2/files/resnet34_peoplenet_pruned_int8.etlt' && \
  wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/pruned_quantized_v2.3.2/files/resnet34_peoplenet_pruned_int8.txt' && \
  wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/pruned_quantized_v2.3.2/files/labels.txt' && \
  cp ${ISAAC_ROS_WS}/src/isaac_ros_object_detection/isaac_ros_detectnet/resources/peoplenet_config.pbtxt config.pbtxt

PeopleSemSegNet ShuffleSeg

mkdir -p ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/peoplesemsegnet_shuffleseg && \
  cd ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/peoplesemsegnet_shuffleseg && \
  wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplesemsegnet/versions/deployable_shuffleseg_unet_v1.0/files/peoplesemsegnet_shuffleseg_etlt.etlt && \
  wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplesemsegnet/versions/deployable_shuffleseg_unet_v1.0/files/peoplesemsegnet_shuffleseg_cache.txt && \
  cp ${ISAAC_ROS_WS}/src/isaac_ros_dnn_inference/resources/peoplesemsegnet_shuffleseg_config.pbtxt config.pbtxt

List of Isaac ROS Benchmarks

Note

Prior to running any of the benchmark scripts, your environment must satisfy the following prerequisites:

Note

The naming convention _node is used to represent a graph under test that contains a single node (for example, stereo_image_proc_node.py) and _graph to represent a graph of multiple nodes (for example, stereo_image_proc_graph.py).

Name

Description

Dataset Sequence

Launch Command

AprilTag Node

Detect AprilTags

r2b_storage

launch_test isaac_ros_benchmark/scripts/isaac_ros_apriltag_node.py

AprilTag Graph

Rectify image and detect AprilTags

r2b_storage

launch_test isaac_ros_benchmark/scripts/isaac_ros_apriltag_graph.py

Freespace Segmentation Node

Project freespace onto occupancy grid

r2b_lounge

launch_test isaac_ros_benchmark/scripts/isaac_ros_bi3d_fs_node.py

Freespace Segmentation Graph

Create depth segmentation disparity image and project freespace onto occupancy grid

r2b_lounge

launch_test isaac_ros_benchmark/scripts/isaac_ros_bi3d_fs_graph.py

Depth Segmentation Node

Create depth segmentation disparity image

r2b_lounge

launch_test isaac_ros_benchmark/scripts/isaac_ros_bi3d_node.py

CenterPose Pose Estimation Graph

Encode image, run CenterPose model inference on TensorRT, decode output tensor as marker array

r2b_storage

launch_test isaac_ros_benchmark/scripts/isaac_ros_centerpose_graph.py

DetectNet Object Detection Graph

Encode image, run PeopleNet on Triton, decode output tensor as detection array

r2b_hallway

launch_test isaac_ros_benchmark/scripts/isaac_ros_detectnet_graph.py

Stereo Disparity Node

Create stereo disparity image

r2b_datacenter

launch_test isaac_ros_benchmark/scripts/isaac_ros_disparity_node.py

Stereo Disparity Graph

Create stereo disparity image, convert disparity image to point cloud

r2b_datacenter

launch_test isaac_ros_benchmark/scripts/isaac_ros_disparity_graph.py

DNN Image Encoder Node

Encode image as resized, normalized tensor

r2b_hallway

launch_test isaac_ros_benchmark/scripts/isaac_ros_dnn_image_encoder_node.py

DOPE Pose Estimation Graph

Encode image, run DOPE on TensorRT, decode output tensor as pose array

r2b_hallway

launch_test isaac_ros_benchmark/scripts/isaac_ros_dope_graph.py

DNN Stereo Disparity Node

Create ESS-inferred stereo disparity image

r2b_hideaway

launch_test isaac_ros_benchmark/scripts/isaac_ros_ess_node.py

DNN Stereo Disparity Graph

Create ESS-inferred stereo disparity image, convert disparity image to point cloud

r2b_hideaway

launch_test isaac_ros_benchmark/scripts/isaac_ros_ess_graph.py

Occupancy Grid Localizer Node

Estimate pose relative to map

r2b_storage

launch_test isaac_ros_benchmark/scripts/isaac_ros_grid_localizer_node.py

H264 Decoder Node

Decode compressed image

r2b_compressed_image [1]

launch_test isaac_ros_benchmark/scripts/isaac_ros_h264_decoder_node.py

H264 Encoder Node I-frame Support

Encode compressed image (I-frame)

r2b_mezzanine

launch_test isaac_ros_benchmark/scripts/isaac_ros_h264_encoder_iframe_node.py

H264 Encoder Node P-frame Support

Encode compressed image (P-frame)

r2b_mezzanine

launch_test isaac_ros_benchmark/scripts/isaac_ros_h264_encoder_pframe_node.py

Nvblox Node

Generate colorized 3D mesh

r2b_hideaway

launch_test isaac_ros_benchmark/scripts/isaac_ros_nvblox_node.py

Rectify Node

Rectify image

r2b_storage

launch_test isaac_ros_benchmark/scripts/isaac_ros_rectify_node.py

TensorRT Node PeopleSemSegNet

Run PeopleSemSegNet inference on TensorRT

r2b_hallway

launch_test isaac_ros_benchmark/scripts/isaac_ros_tensor_rt_ps_node.py

TensorRT Node DOPE

Run DOPE inference on TensorRT

r2b_hope

launch_test isaac_ros_benchmark/scripts/isaac_ros_tensor_rt_dope_node.py

Triton Node PeopleSemSegNet

Run PeopleSemSegNet inference on Triton

r2b_hallway

launch_test isaac_ros_benchmark/scripts/isaac_ros_triton_ps_node.py

Triton Node DOPE

Run DOPE inference on Triton

r2b_hope

launch_test isaac_ros_benchmark/scripts/isaac_ros_triton_dope_node.py

U-Net Graph

Encode image, run PeopleSemSegNet on TensorRT, decode output tensor as segmentation masks

r2b_hallway

launch_test isaac_ros_benchmark/scripts/isaac_ros_unet_graph.py

Visual SLAM Node

Perform stereo visual simultaneous localization and mapping

r2b_cafe

launch_test isaac_ros_benchmark/scripts/isaac_ros_visual_slam_node.py

Results

The Isaac ROS Performance Summary provides an overview of the benchmark results for the Isaac ROS configurations, along with links to each of the results in JSON format.

Profiling

When seeking to optimize performance, profiling is often used to gain deep insight into the call stack and to identify where processing time is spent in functions. ros2_tracing provides tracing instrumentation to better understand performance on a CPU, but lacks information on GPU acceleration.

Nsight Systems is a freely-available tool that provides tracing instrumentation for CPU, GPU, and other SOC (system-on-chip) hardware accelerators on aarch64 and x86_64 platforms. The Isaac ROS team uses this tooling internally to profile Isaac ROS graphs, to optimize individual node-level computation, and to improve synchronization between heterogeneous computing hardware. These tools allow for before-and-after testing to inspect profile differences with the benchmark tooling.

Profiling hooks for Nsight Systems have been integrated in ros2_benchmark scripts for rich annotations. The following commands show an example of how to use Nsight Systems to profile a benchmark on a fiducial detection graph built with Isaac ROS AprilTag:

launch_test src/isaac_ros_benchmark/isaac_ros_benchmark/scripts/isaac_ros_apriltag_node.py enable_nsys:=true nsys_profile_name:=isaac_ros_apriltag_profile

Nsys Parameters

Default

Description

enable_nsys

false

Enable nsys or not

nsys_profile_flags

--trace=osrt,nvtx,cuda

Flags passed to nsys

nsys_profile_name

profile_{machine_type}_{current_time}

Nsys profiling output file name

Repositories and Packages

The Isaac ROS implementations of this technology are available at the following: