YOLOv8 is a state-of-the-art, real-time model for perception tasks including object detection, tracking, segmentation, classification, and pose estimation.
YOLOv8 is pre-trained on the COCO dataset to perform object detection out of the box for 80 object classes. It offers various size options (nano/small/medium/large/extra-large) to apply to different use cases. Model size is proportional to accuracy and inversely proportional to inference time. Smaller models are appropriate for resource-constrained devices used in edge scenarios while larger models are useful where there’s less data.
Advantages of YOLOv8:
High speed and accuracy
Adaptable to different hardware platforms, from CPUs to GPUs and cloud
Offers developer-friendly features including easy CLI and Python packages
How YOLOv8 Works
A neural network uses multiple detection heads to resolve objects of different scales in an image. YOLOv8 has 3 detection heads that each look at the input image with a different stride. It outputs three feature maps at different scales - 80x80, 40x40 and 20x20.
YOLOv8 outputs a tensor of size (1, 8400, 84) for an input image. Here, 1 represents the batch size and 8400 represents output from the three feature maps - (80x80 + 40x40 + 20x20). 84 represents the bounding box and class probability information (bounding box ‘x’ coordinate, ‘y’ coordinate, height and width + 80 class probabilities). You can filter candidate detections based on the detection score during post processing.
YOLOv8 performs anchor-free detection, which means it predicts an object’s center directly instead of predicting the offset from an anchor box (visualization below). This speeds up the post processing step of Non-Maximum Suppression (NMS) by reducing the number of candidate predictions.
Repositories and Packages
The Isaac ROS implementations of this technology are available here: