Developer Environment
GXF Failure due to Tensor Stream Creation Failed
When launching an Isaac ROS node on a Jetson accessed remotely with X11 forwarding, X events emitted by CUDA for initializing a tensor stream are captured by the remote monitoring machine instead of the local Jetson.
Symptom
[component_container_mt-2] 2023-09-15 14:33:11.362 ERROR /workspaces/isaac_ros-dev/src/isaac_ros_image_pipeline/isaac_ros_image_proc/gxf/tensorops/extensions/tensorops/components/TensorStream.cpp@104: tensor stream creation failed.
[component_container_mt-2] 2023-09-15 14:33:11.363 ERROR gxf/std/entity_warden.cpp@380: Failed to initialize component 00292 (stream)
[component_container_mt-2] 2023-09-15 14:33:11.363 ERROR gxf/core/runtime.cpp@616: Could not initialize entity 'IPOQAFVQYV_global' (E290): GXF_FAILURE
[component_container_mt-2] 2023-09-15 14:33:11.363 ERROR gxf/std/program.cpp@205: Failed to activate entity 00290 named IPOQAFVQYV_global: GXF_FAILURE
[component_container_mt-2] 2023-09-15 14:33:11.363 ERROR gxf/std/program.cpp@207: Deactivating...
[component_container_mt-2] 2023-09-15 14:33:11.396 ERROR gxf/core/runtime.cpp@1227: Graph activation failed with error: GXF_FAILURE
[component_container_mt-2] [ERROR] [1694759591.396604150] [left_decoder_node]: [NitrosContext] GxfGraphActivate Error: GXF_FAILURE
[component_container_mt-2] [INFO] [1694759591.396906843] [right_decoder_node]: [NitrosContext] Loading application: '/workspaces/isaac_ros-dev/install/isaac_ros_h264_decoder/share/isaac_ros_h264_decoder/EZXPATGEMM.yaml'
[component_container_mt-2] [ERROR] [1694759591.396922620] [left_decoder_node]: [NitrosNode] runGraphAsync Error: GXF_FAILURE
[component_container_mt-2] terminate called after throwing an instance of 'std::runtime_error'
[component_container_mt-2] what(): [NitrosNode] runGraphAsync Error: GXF_FAILURE
Solution
Disabling X11 forwarding will prevent X events from being intercepted by the remote machine, resolving the error.
Docker buildx
plugin not installed
Isaac ROS Dev scripts rely on Docker configured with the buildx
plugin which is no longer installed by default.
Symptom
nvidia@ubuntu:~/workspaces/isaac_ros-dev/src/isaac_ros_common$ ./scripts/run_dev.sh
Building aarch64.ros2_humble.ros1_noetic.user base as image: isaac_ros_dev-aarch64 using key aarch64.ros2_humble.ros1_noetic.user
Using configured docker search paths: /home/nvidia/workspaces/isaac_ros-dev/src/isaac_ros_common/scripts/../../isaac_ros_nitros_bridge/docker /home/nvidia/workspaces/isaac_ros-dev/src/isaac_ros_common/scripts/../docker
Using base image name not specified, using ''
Using docker context dir not specified, using Dockerfile directory
Resolved the following Dockerfiles for target image: aarch64.ros2_humble.ros1_noetic.user
/home/nvidia/workspaces/isaac_ros-dev/src/isaac_ros_common/scripts/../docker/Dockerfile.user
/home/nvidia/workspaces/isaac_ros-dev/src/isaac_ros_common/scripts/../../isaac_ros_nitros_bridge/docker/Dockerfile.ros1_noetic
/home/nvidia/workspaces/isaac_ros-dev/src/isaac_ros_common/scripts/../docker/Dockerfile.aarch64.ros2_humble
Building /home/nvidia/workspaces/isaac_ros-dev/src/isaac_ros_common/scripts/../docker/Dockerfile.aarch64.ros2_humble as image: aarch64-ros2_humble-image with base:
ERROR: BuildKit is enabled but the buildx component is missing or broken.
Install the buildx component to build images with BuildKit:
https://docs.docker.com/go/buildx/
Failed to build base image: isaac_ros_dev-aarch64, aborting.
Solution
From the official Docker install instructions (here), install the
docker-buildx-plugin.
# Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl gnupg sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod a+r /etc/apt/keyrings/docker.gpg # Add the repository to Apt sources: echo \ "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt install docker-buildx-plugin
Repository cloned without pulling git-lfs
files
Symptom
You fail building some isaac_ros
packages, or some image files in the
repository are empty.
Solution
Many Isaac ROS repositories are configured with Git LFS. For example,
isaac_ros_visual_slam
repository is configured with Git LFS to accommodate
big rosbag files for test.
To check if some of the LFS files are not resolved, you can first list all the files that are supposed to be managed by Git LFS.
cd ${ISAAC_ROS_WS}/src/isaac_ros_common/ && \
git lfs ls-files -s
It should give something like this.
Example of output of
git lfs ls-files -s
:cfbf5fbcee - resources/Isaac_sim_app_launcher.png (68 KB) 448d244a3e - resources/Isaac_sim_app_terminal.png (57 KB) 750f7de371 - resources/graph_read-write-speed.png (77 KB) c29c6d6a3a - resources/isaac_ros_common_tools.png (30 KB) 87e38eea78 - resources/isaac_sim_initial_screen.png (265 KB) e2292bc330 - resources/isaac_sim_nucleus_setup.png (221 KB) ea8b297681 - resources/isaac_sim_ros_bridge.png (300 KB)
Now check the actual file size of those files.
git lfs ls-files -s | awk '{ print $3 }' | xargs ls -l
Example of output of
git lfs ls-files -s | awk '{ print $3 }' | xargs ls -l
:-rw-rw-r- 1 jetson jetson 130 Mar 22 08:56 resources/graph_read-write-speed.png -rw-rw-r- 1 jetson jetson 130 Mar 22 08:56 resources/isaac_ros_common_tools.png -rw-rw-r- 1 jetson jetson 130 Mar 22 08:56 resources/Isaac_sim_app_launcher.png -rw-rw-r- 1 jetson jetson 130 Mar 22 08:56 resources/Isaac_sim_app_terminal.png -rw-rw-r- 1 jetson jetson 131 Mar 22 08:56 resources/isaac_sim_initial_screen.png -rw-rw-r- 1 jetson jetson 131 Mar 22 08:56 resources/isaac_sim_nucleus_setup.png -rw-rw-r- 1 jetson jetson 131 Mar 22 08:56 resources/isaac_sim_ros_bridge.png
In above example, all the file sizes are 130byte or 131byte, much
smaller than what was advertised by git lfs ls-files -s
. This shows
that those LFS files are not really pulled.
Optionally, if you know a directory that should have a large LFS file, you can check the directory size.
du -hs ${ISAAC_ROS_WS}/src/isaac_ros_visual_slam/isaac_ros_visual_slam/test/test_cases/rosbags/
20K /home/username/workspaces/isaac_ros-dev/src/isaac_ros_visual_slam/isaac_ros_visual_slam/test/test_cases/rosbags/
If any of above is the case, you can manually pull the LFS files, after
installing git lfs
.
cd ${ISAAC_ROS_WS}/src/isaac_ros_common && \
git lfs pull
or
cd ${ISAAC_ROS_WS}/src/isaac_ros_visual_slam && \
git lfs pull -X "" -I isaac_ros_visual_slam/test/test_cases/rosbags/
X11-forward issue
Symptom
When you run X11 window application inside the Isaac ROS container, you see an error message:
“X11 connection rejected because of wrong authentication.”
Solution
Go back to your docker host, and re SSH-login from your remote machine
with ssh -X
option.
ssh -X ${USER}@${IP}
Check if you see an error message like this.
/usr/bin/xauth: /home/jetson/.Xauthority not writable, changes will be ignored
/usr/bin/xauth: /home/jetson/.Xauthority not writable, changes ignored
If so, delete the ~/.Xauthority
directory.
sudo rm -rf ~/.Xauthority/
Log out and re-login with ssh -X
.
Check if you have .Xauthority
file with your user as owner.
~$ ll ~/.Xauthority
-rw------- 1 jetson jetson 61 Mar 22 14:08 /home/jetson/.Xauthority
Now, you should be able to achieve X11 forwarding. Try with xeyes
on
your docker host.
Then, launch your container and try running the app that requires X11 forwarding.
ROS 2 Domain ID Collision
Symptom
You see ROS 2 topics that your system is not (yet) publishing
ROS 2 topics don’t go away even after you stop the node that you think is publishing
Solution
By default, all ROS 2 nodes use domain ID 0, so it is possible you are seeing topics from other machines on the network (docs.ros.org).
Solution 1:
You can declare ROS_DOMAIN_ID
on every bash instance inside the
Isaac ROS container.
ROS_DOMAIN_ID=42
Solution 2:
You declare the ROS_DOMAIN_ID
on the host once, and modify the
run_dev.sh
script to carry the environment variable.
vi ${ISAAC_ROS_WS}/src/isaac_ros_common/scripts/run_dev.sh
Find the portion that list all the environment variables to carry over, and add like this.
DOCKER_ARGS+=("-v /tmp/.X11-unix:/tmp/.X11-unix")
DOCKER_ARGS+=("-v $HOME/.Xauthority:/home/admin/.Xauthority:rw")
DOCKER_ARGS+=("-e DISPLAY")
DOCKER_ARGS+=("-e ROS_DOMAIN_ID")
No /opt/ros/humble/install
directory in Docker container
Symptom
You expect the /opt/ros/humble/install
directory to exist inside the
Isaac ROS Dev Docker container, but this directory does not exist.
This is the result of a change made in the Isaac ROS 2.1 release in November 2023.
In Isaac ROS 2.1, all base ROS 2 Humble packages are installed using pre-built Debian packages produced by the NVIDIA ROS 2 Buildfarm.
Previously, these base packages were built from source in a customized global underlay workspace rooted at /opt/ros/humble/
.
Since the base packages are no longer built from source inline within the image, the ./install
subfolder of /opt/ros/humble
is never created.
Solution
Solution 1:
Move any user-specific ROS 2 packages out of the global underlay workspace to an appropriate local overlay workspace, as recommended by the official ROS documentation.
Then, build and source this overlay workspace normally.
Solution 2:
Carefully recreate the customized global underlay workspace by following the steps used in Isaac ROS 2.0 and older releases:
# EXAMPLE: Install OSRF negotiated ROS 2 package from source
ENV ROS_DISTRO=humble
ENV ROS_ROOT=/opt/ros/${ROS_DISTRO}
RUN apt-get update && mkdir -p ${ROS_ROOT}/src && cd ${ROS_ROOT}/src \
&& git clone https://github.com/osrf/negotiated && cd negotiated && git checkout master \
&& source ${ROS_ROOT}/setup.bash \
&& cd negotiated_interfaces && bloom-generate rosdebian && fakeroot debian/rules binary \
&& cd ../ && apt-get install -y ./*.deb && rm ./*.deb \
&& cd negotiated && bloom-generate rosdebian && fakeroot debian/rules binary \
&& cd ../ && apt-get install -y ./*.deb && rm ./*.deb \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
This process involves overriding the behavior of bloom-generate
to force it to produce and install the Debian packages inline.
run_dev.sh
fails to build image with Docker certificate errors
Symptom
Docker build fails building base image with error messages with the following form:
tls: failed to verify certificate: x509: certificate has expired or is not yet valid
More context:
[+] Building 0.2s (2/2) FINISHED docker:default
=> [internal] load build definition from Dockerfile.aarch64 0.0s
=> => transferring dockerfile: 8.45kB 0.0s
=> ERROR [internal] load metadata for nvcr.io/nvidia/nvidia-aarch64-base:ubuntu22.04 0.1s
------
> [internal] load metadata for nvcr.io/nvidia/nvidia-aarch64-base:ubuntu22.04:
------
Dockerfile.aarch64:11
--------------------
9 | # Docker file for aarch64 based Jetson device
10 | ARG BASE_IMAGE="nvcr.io/nvidia/nvidia-aarch64-base:ubuntu22.04"
11 | >>> FROM ${BASE_IMAGE}
12 |
13 | # Store list of packages (must be first)
--------------------
ERROR: failed to solve: nvcr.io/nvidia/nvidia-aarch64-base:ubuntu22.04: failed to resolve source metadata for nvcr.io/nvidia/nvidia-aarch64-base:ubuntu22.04: failed to do request: Head "https://nvcr.io/v2/nvidia/nvidia-aarch64-base/manifests/ubuntu22.04";: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 1969-12-31T17:10:53-08:00 is before 2024-04-24T00:00:00Z
/mnt/nova_ssd/workspaces/isaac_ros-dev/src/isaac_ros_common
Solution
Restarting the Docker daemon should resolve the issue:
sudo systemctl restart docker
Failed to load a NITROS enabled nodes with intraprocess communication enabled
When launching a NITROS-enabled ROS node with explicitly setting use_intra_process_comms
to True
, the following error can be seen.
Symptom
[component_container-1] [ERROR] [1715506914.017761602] [camera_1.camera_1_container]: Component constructor threw an exception: intraprocess communication allowed only with volatile durability
[ERROR] [launch_ros.actions.load_composable_nodes]: Failed to load node 'visual_slam_node' of type 'nvidia::isaac_ros::visual_slam::VisualSlamNode' in container '/camera_1/camera_1_container': Component constructor threw an exception: intraprocess communication allowed only with volatile durability
Solution
NITROS nodes already have intraprocess communication enabled by default, with an exception of few publishers and subscribers that does not participate in data transfer. Disabling intraprocess communication by setting the parameter to False
should resolve the issue and it does not hamper the performance.