Benchmarking
Robots are real-time systems that require complex graphs of heterogeneous computation to perform perception, planning, and control. These graphs of computation need to perform work deterministically and with known latency. The computing platform has a fixed budget for heterogeneous computation (TOPS) and throughput. Computation is typically performed on multiple CPUs, GPUs, and additional special purpose, fixed function hardware accelerators.
ros2_benchmark provides the tools for measuring the throughput, latency, and compute utilization of these complex graphs without altering the code under test. The results can be used to make informed design decisions on how a robotics application can best meet its real-time requirements.
The following describes how to configure benchmarking for Isaac ROS Benchmark, which builds on ros2_benchmark for Isaac ROS graphs.
Benchmark Configurations
Performance measurement for graphs of nodes requires:
a ROS 2 launch file to launch the benchmark
an input YAML file specifying the configuration information
Each Isaac ROS Benchmark launch file details its own benchmark
configuration via a comment near the top of the file. For example, the
following comment is included immediately following the license header
in isaac_ros_benchmark/benchmarks/isaac_ros_apriltag_benchmark/scripts/isaac_ros_apriltag_node.py
:
"""
Performance test for Isaac ROS AprilTagNode.
The graph consists of the following:
- Preprocessors:
1. PrepResizeNode: resizes images to HD
- Graph under Test:
1. AprilTagNode: detects AprilTags
Required:
- Packages:
- isaac_ros_image_proc
- isaac_ros_apriltag
- Datasets:
- assets/datasets/r2b_dataset/r2b_storage
"""
Each section of this comment is explained in further detail below.
Preprocessors
In some cases, the desired input sequence contains data that is not yet
in the appropriate format to be received by the ROS 2 graph under test.
For example, in the case of isaac_ros_apriltag_node.py
, the input
dataset’s images must be resized into HD resolution before being
passed into the AprilTag detecting node.
The preprocessing nodes allow for these types of data transformations to be executed before the critical timing section of the benchmark begins, ensuring that there is no undesired penalty in performance.
Graph Under Test
The graph under test refers to the core selection of ROS 2 nodes whose
performance is to be measured in this specific benchmark. For example,
in the case of isaac_ros_apriltag_node.py
, only the Isaac ROS
AprilTag detecting node is included under the graph under test.
By contrast, the isaac_ros_apriltag_graph.py
benchmark includes
multiple nodes in its graph under test:
"""
[...]
- Graph under Test:
1. RectifyNode: rectifies images
2. AprilTagNode: detects AprilTags
[...]
"""
The Isaac ROS Benchmark collection of benchmark scripts includes both
individual node and composite graphs under test. Node-specific
benchmarks, identified by the _node
suffix, showcase the absolute
maximum performance possible when a node is run in isolation. Larger
graph benchmarks, identified by the _graph
suffix, present
performance in a more typical use case.
Required Packages
Each benchmark in the Isaac ROS Benchmark collection includes a different selection of nodes as preprocessors or components of the graph under test. Consequently, each benchmark requires its own specific subset of the Isaac ROS suite of packages in order to successfully run.
For example, the isaac_ros_apriltag_graph.py
benchmark directly
depends on the isaac_ros_image_proc
and isaac_ros_apriltag
packages. These packages, along with their own recursive dependencies,
must be properly built/installed and sourced prior to running the benchmark.
For benchmarks in the Isaac ROS Benchmark collection these dependencies
are captured in their benchmark packages, so it is sufficient to build/install
the benchmark packages (isaac_ros_apriltag_benchmark
for instance).
Required Datasets
Most Isaac ROS Benchmark scripts use the standard r2b Dataset
as input data.
Before running any benchmarks, the input datasets must be downloaded by following
the instructions here.
Required Models
Some of the benchmark graphs require loading model files. Models used by
a benchmark graph are listed in the benchmark script’s header. By
default models are expected to be accessible by a benchmark script under
${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models
.
Before benchmarking a node that contains a DNN, the DNN must be
downloaded and converted to a .plan
file for the host system. You can do this
using the instructions provided in the table below:
Model |
Command |
---|---|
mkdir -p ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/bi3d && \
cd ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/bi3d && \
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/isaac/bi3d_proximity_segmentation/versions/2.0.0/files/featnet.onnx' && \
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/isaac/bi3d_proximity_segmentation/versions/2.0.0/files/segnet.onnx'
|
|
mkdir -p ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models && \
cd ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models && \
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/isaac/dnn_stereo_disparity/versions/4.0.0/files/dnn_stereo_disparity_v4.0.0.tar.gz' && \
tar -xvf dnn_stereo_disparity_v4.0.0.tar.gz && \
mv dnn_stereo_disparity_v4.0.0 ess
|
|
|
|
mkdir -p ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/peoplenet && \
cd ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/peoplenet && \
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/pruned_quantized_v2.3.2/files/resnet34_peoplenet_pruned_int8.etlt' && \
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/pruned_quantized_v2.3.2/files/resnet34_peoplenet_pruned_int8.txt' && \
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/pruned_quantized_v2.3.2/files/labels.txt' && \
cp ${ISAAC_ROS_WS}/src/isaac_ros_object_detection/isaac_ros_detectnet/resources/peoplenet_config.pbtxt config.pbtxt
|
|
mkdir -p ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/peoplesemsegnet_shuffleseg && \
cd ${ISAAC_ROS_WS}/src/ros2_benchmark/assets/models/peoplesemsegnet_shuffleseg && \
wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplesemsegnet/versions/deployable_shuffleseg_unet_v1.0/files/peoplesemsegnet_shuffleseg_etlt.etlt && \
wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplesemsegnet/versions/deployable_shuffleseg_unet_v1.0/files/peoplesemsegnet_shuffleseg_cache.txt && \
cp ${ISAAC_ROS_WS}/src/isaac_ros_dnn_inference/resources/peoplesemsegnet_shuffleseg_config.pbtxt config.pbtxt
|
List of Isaac ROS Benchmarks
Note
Prior to running any of the benchmark scripts, your environment must satisfy the following prerequisites:
All required datasets have been downloaded per the README datasets instructions.
For DNN-based benchmarks, all required DNNs have been prepared per the README model prep instructions.
All required packages have been downloaded, built, and sourced.
Note
The naming convention _node
is used to represent a
graph under test that contains a single node (for example,
stereo_image_proc_node.py
) and _graph
to represent a graph
of multiple nodes (for example, stereo_image_proc_graph.py
).
Name |
Description |
Dataset Sequence |
Launch Command |
---|---|---|---|
AprilTag Node |
Detect AprilTags |
|
|
AprilTag Graph |
Rectify image and detect AprilTags |
|
|
Freespace Segmentation Node |
Project freespace onto occupancy grid |
|
|
Freespace Segmentation Graph |
Create depth segmentation disparity image and project freespace onto occupancy grid |
|
|
Depth Segmentation Node |
Create depth segmentation disparity image |
|
|
CenterPose Pose Estimation Graph |
Encode image, run CenterPose model inference on TensorRT, decode output tensor as marker array |
|
|
DetectNet Object Detection Graph |
Encode image, run PeopleNet on Triton, decode output tensor as detection array |
|
|
Stereo Disparity Node |
Create stereo disparity image |
|
|
Stereo Disparity Graph |
Create stereo disparity image, convert disparity image to point cloud |
|
|
DNN Image Encoder Node |
Encode image as resized, normalized tensor |
|
|
DOPE Pose Estimation Graph |
Encode image, run DOPE on TensorRT, decode output tensor as pose array |
|
|
DNN Stereo Disparity Node |
Create ESS-inferred stereo disparity image |
|
|
DNN Stereo Disparity Graph |
Create ESS-inferred stereo disparity image, convert disparity image to point cloud |
|
|
Occupancy Grid Localizer Node |
Estimate pose relative to map |
|
|
H264 Decoder Node |
Decode compressed image |
|
|
H264 Encoder Node I-frame Support |
Encode compressed image (I-frame) |
|
|
H264 Encoder Node P-frame Support |
Encode compressed image (P-frame) |
|
|
Nvblox Node |
Generate colorized 3D mesh |
|
|
Rectify Node |
Rectify image |
|
|
Resize Node |
Resize image (1920x1200 to 960x576) |
|
|
TensorRT Node PeopleSemSegNet |
Run PeopleSemSegNet inference on TensorRT |
|
|
TensorRT Node DOPE |
Run DOPE inference on TensorRT |
|
|
Triton Node PeopleSemSegNet |
Run PeopleSemSegNet inference on Triton |
|
|
Triton Node DOPE |
Run DOPE inference on Triton |
|
|
U-Net Graph |
Encode image, run PeopleSemSegNet on TensorRT, decode output tensor as segmentation masks |
|
|
Segment Anything Graph |
Encode image, run SAM on Triton, decode output tensor as segmentation masks |
|
|
Mobile Segment Anything Graph |
Encode image, run Mobile SAM on Triton, decode output tensor as segmentation masks |
|
|
RT-DETR Graph |
Encode image, run RT-DETR on TensorRT, decode output tensor as detection arrays |
|
|
FoundationPose Node |
Detect 6D pose of objects |
|
|
FoundationPose Tracking Node |
Detect 6D pose of objects |
|
|
CuMotion Planner Node |
Create motion plan |
N/A |
|
Visual SLAM Node |
Perform stereo visual simultaneous localization and mapping |
|
|
Results
The Isaac ROS Performance Summary provides an overview of the benchmark results for the Isaac ROS configurations, along with links to each of the results in JSON format.
Profiling
When seeking to optimize performance, profiling is often used to gain deep insight into the call stack and to identify where processing time is spent in functions. ros2_tracing provides tracing instrumentation to better understand performance on a CPU, but lacks information on GPU acceleration.
Nsight Systems is a
freely-available tool that provides tracing instrumentation for CPU,
GPU, and other SOC (system-on-chip) hardware accelerators on
aarch64
and x86_64
platforms. The Isaac ROS team uses this
tooling internally to profile Isaac ROS graphs, to optimize individual
node-level computation, and to improve synchronization between
heterogeneous computing hardware. These tools allow for before-and-after
testing to inspect profile differences with the benchmark tooling.
Profiling hooks for Nsight Systems have been integrated in ros2_benchmark scripts for rich annotations. The following commands show an example of how to use Nsight Systems to profile a benchmark on a fiducial detection graph built with Isaac ROS AprilTag:
launch_test $(ros2 pkg prefix isaac_ros_apriltag_benchmark)/share/isaac_ros_apriltag_benchmark/scripts/isaac_ros_apriltag_node.py enable_nsys:=true nsys_profile_name:=isaac_ros_apriltag_profile
Nsys Parameters |
Default |
Description |
---|---|---|
|
|
Enable |
|
|
Flags passed to |
|
|
Nsys profiling output file name |
Repositories and Packages
The Isaac ROS implementations of this technology are available at the following: