isaac_ros_grounding_dino#

Source code available on GitHub.

Quickstart#

Set Up Development Environment#

  1. Set up your development environment by following the instructions in getting started.

  2. (Optional) Install dependencies for any sensors you want to use by following the sensor-specific guides.

    Note

    We strongly recommend installing all sensor dependencies before starting any quickstarts. Some sensor dependencies require restarting the development environment during installation, which will interrupt the quickstart process.

Download Quickstart Assets#

  1. Download quickstart data from NGC:

    Make sure required libraries are installed.

    sudo apt-get install -y curl jq tar
    

    Then, run these commands to download the asset from NGC:

    NGC_ORG="nvidia"
    NGC_TEAM="isaac"
    PACKAGE_NAME="isaac_ros_grounding_dino"
    NGC_RESOURCE="isaac_ros_grounding_dino_assets"
    NGC_FILENAME="quickstart.tar.gz"
    MAJOR_VERSION=4
    MINOR_VERSION=0
    VERSION_REQ_URL="https://catalog.ngc.nvidia.com/api/resources/versions?orgName=$NGC_ORG&teamName=$NGC_TEAM&name=$NGC_RESOURCE&isPublic=true&pageNumber=0&pageSize=100&sortOrder=CREATED_DATE_DESC"
    AVAILABLE_VERSIONS=$(curl -s \
        -H "Accept: application/json" "$VERSION_REQ_URL")
    LATEST_VERSION_ID=$(echo $AVAILABLE_VERSIONS | jq -r "
        .recipeVersions[]
        | .versionId as \$v
        | \$v | select(test(\"^\\\\d+\\\\.\\\\d+\\\\.\\\\d+$\"))
        | split(\".\") | {major: .[0]|tonumber, minor: .[1]|tonumber, patch: .[2]|tonumber}
        | select(.major == $MAJOR_VERSION and .minor <= $MINOR_VERSION)
        | \$v
        " | sort -V | tail -n 1
    )
    if [ -z "$LATEST_VERSION_ID" ]; then
        echo "No corresponding version found for Isaac ROS $MAJOR_VERSION.$MINOR_VERSION"
        echo "Found versions:"
        echo $AVAILABLE_VERSIONS | jq -r '.recipeVersions[].versionId'
    else
        mkdir -p ${ISAAC_ROS_WS}/isaac_ros_assets && \
        FILE_REQ_URL="https://api.ngc.nvidia.com/v2/resources/$NGC_ORG/$NGC_TEAM/$NGC_RESOURCE/\
    versions/$LATEST_VERSION_ID/files/$NGC_FILENAME" && \
        curl -LO --request GET "${FILE_REQ_URL}" && \
        tar -xf ${NGC_FILENAME} -C ${ISAAC_ROS_WS}/isaac_ros_assets && \
        rm ${NGC_FILENAME}
    fi
    

Build isaac_ros_grounding_dino#

  1. Activate the Isaac ROS environment:

    isaac-ros activate
    
  2. Install the prebuilt Debian package:

    sudo apt-get update
    
    sudo apt-get install -y ros-jazzy-isaac-ros-grounding-dino && \
       sudo apt-get install -y ros-jazzy-isaac-ros-grounding-dino-models-install
    
  3. Download and set up (convert ONNX to TensorRT engine plan) the pre-trained Grounding DINO model:

    sudo apt-get update
    
    ros2 run isaac_ros_grounding_dino_models_install install_grounding_dino_models.sh --eula
    

    Note

    The time taken to convert the ONNX model to a TensorRT engine plan varies across different platforms. On Jetson AGX Thor, for example, the engine conversion process can take up to 10-15 minutes to complete.

Run Launch File#

  1. Continuing inside the Isaac ROS environment, install the following dependencies:

    sudo apt-get update
    
    sudo apt-get install -y ros-jazzy-isaac-ros-examples
    
  2. Run the following launch file to spin up a demo of this package using the quickstart rosbag:

    ros2 launch isaac_ros_examples isaac_ros_examples.launch.py launch_fragments:=grounding_dino interface_specs_file:=${ISAAC_ROS_WS}/isaac_ros_assets/isaac_ros_grounding_dino/quickstart_interface_specs.json model_file_path:=${ISAAC_ROS_WS}/isaac_ros_assets/models/grounding_dino/grounding_dino_model.onnx engine_file_path:=${ISAAC_ROS_WS}/isaac_ros_assets/models/grounding_dino/grounding_dino_model.plan
    
  3. Open a second terminal inside the Isaac ROS environment:

    isaac-ros activate
    
  4. Run the rosbag file to simulate an image stream:

    ros2 bag play -l ${ISAAC_ROS_WS}/isaac_ros_assets/isaac_ros_grounding_dino/quickstart.bag
    

Visualize Results#

  1. Open a new terminal inside the Isaac ROS environment:

    isaac-ros activate
    
  2. Run the Grounding DINO visualization script:

    ros2 run isaac_ros_grounding_dino isaac_ros_grounding_dino_visualizer.py
    
  3. Open another terminal inside the Isaac ROS environment:

    isaac-ros activate
    
  4. Visualize and validate the output of the package with rqt_image_view:

    ros2 run rqt_image_view rqt_image_view /grounding_dino_processed_image
    

    Your output should look like this:

    RQT showing detection of various objects
  5. Open another terminal inside the Isaac ROS environment:

    isaac-ros activate
    
  6. Dynamically change the prompt using the set_prompt service:

    ros2 service call /set_prompt isaac_ros_grounding_dino_interfaces/srv/SetPrompt "{prompt: 'shopping bag.person.'}"
    

Note

Grounding DINO is designed to perform object detection on previously unseen objects without model retraining. As a result, inference requires significant GPU compute. Refer to performance of the model for your target platform to determine which model to use.

More powerful discrete GPUs can outperform all other platforms for this task and should be preferred if higher performance is required. Interleaving object detection with other tasks rather than running continuously can be a more effective solution as well. Finally, if runtime performance is critical and offline training resources are available, developers can train RT-DETR for their own target objects using synthetic data generation and/or real-world data for faster object detection.

Troubleshooting#

Isaac ROS Troubleshooting#

For solutions to problems with Isaac ROS, see troubleshooting.

Deep Learning Troubleshooting#

For solutions to problems with using DNN models, see troubleshooting deeplearning.

API#

Usage#

ros2 launch isaac_ros_grounding_dino isaac_ros_grounding_dino.launch.py model_file_path:=<path to .onnx> engine_file_path:=<path to .plan> input_image_width:=<input image width> input_image_height:=<input image height> network_image_width:=<network image width> network_image_height:=<network image height> input_tensor_names:=<input tensor names> input_binding_names:=<input binding names> output_tensor_names:=<output tensor names> output_binding_names:=<output binding names> tensorrt_verbose:=<TensorRT verbosity> force_engine_update:=<force TensorRT update> confidence_threshold:=<confidence threshold>

GroundingDinoPreprocessorNode#

ROS Parameters#

ROS Parameter

Type

Default

Description

default_prompt

string

''

The default text prompt to use for object detection.

service_call_timeout

int

5

The timeout (in seconds) for the GetTextTokens service call.

service_discovery_timeout

int

5

The timeout (in seconds) for the GetTextTokens service discovery.

ROS Topics Subscribed#

ROS Topic

Interface

Description

image_tensor

isaac_ros_tensor_list_interfaces/TensorList

The tensor that contains the encoded image data.

ROS Topics Published#

ROS Topic

Interface

Description

tensor_pub

isaac_ros_tensor_list_interfaces/TensorList

Tensor list containing encoded image and text tensors.

ROS Services Advertised#

ROS Service

Interface

Description

set_prompt

isaac_ros_grounding_dino_interfaces/SetPrompt

Service to set a new text prompt for object detection.

ROS Services Requested#

ROS Service

Interface

Description

get_text_tokens

isaac_ros_grounding_dino_interfaces/GetTextTokens

Service to tokenize text prompts and generate positive maps.

sync_data_with_decoder

isaac_ros_grounding_dino_interfaces/SyncDataWithDecoder

Service to synchronize class IDs and positive maps between the pre-processor and decoder nodes.

GroundingDinoTextTokenizer#

ROS Services Advertised#

ROS Service

Interface

Description

get_text_tokens

isaac_ros_grounding_dino_interfaces/GetTextTokens

Service to tokenize text prompts into BERT tokens and generate positive maps for label-to-token mapping.

GroundingDinoDecoderNode#

ROS Parameters#

ROS Parameter

Type

Default

Description

boxes_tensor_name

string

boxes

The name of the boxes tensor binding in the input tensor list.

scores_tensor_name

string

scores

The name of the scores tensor binding in the input tensor list.

confidence_threshold

double

0.5

The minimum score required for a particular bounding box to be published.

image_width

int

640

The width of the resized image, for scaling the final bounding box to match the original dimensions.

image_height

int

480

The height of the resized image, for scaling the final bounding box to match the original dimensions.

ROS Topics Subscribed#

ROS Topic

Interface

Description

tensor_sub

isaac_ros_tensor_list_interfaces/TensorList

The tensor that represents the inferred aligned bounding boxes and scores.

ROS Topics Published#

ROS Topic

Interface

Description

detections_output

vision_msgs/Detection2DArray

Aligned image bounding boxes with detection class.

ROS Services Advertised#

ROS Service

Interface

Description

sync_data_with_decoder

isaac_ros_grounding_dino_interfaces/SyncDataWithDecoder

Service to receive and synchronize class IDs and positive maps from the pre-processor node.