3.1 YOLOv3-SPP
From YOLOv2 to YOLOv3, the original darknet-19 network structure is changed to the Darknet-53 network structure, and the target detection problem is transformed into a regression problem, replacing the original softmax. As the backbone of the whole network structure, Darknet-53 controls the output size by increasing the step size of the convolution kernel. YOLOv3 outputs three feature images of different sizes, draws lessons from the idea of FPN, and adopts the method of up-sampling and feature fusion to enhance the feature mining ability of the model. YOLOv3-SPP disassembles and inserts the SPP module at the DBL where the first prediction feature map passes, draws lessons from the idea of a spatial pyramid, uses multiple parallel branches to learn features of different scales, and realizes the fusion of global features and local features.
The formula of the YOLOv3 loss function is:
In Formula (2),
is the predicted position loss, and the specific expression is:
In Formula (3), denotes the weight factor, S2 , B denotes that for a B-scan image divided into S*S grids, and each grid produces B candidate anchor boxes. denotes whether there is a target in the jth anchor box of the ith grid, if not, . If a target exists, , are the horizontal and vertical coordinates, width, and height of the center pixel in the ith grid prediction frame, respectively. are the horizontal and vertical coordinates, width, and height corresponding to the real frame, respectively.
In Formula (2),
is the confidence loss, and the calculation formula is:
In Formula (4), and, respectively, denote whether the bounding box contains the loss weight of the target, takes the opposite value to , for ,, , and , is the prediction probability of the target in the prediction frame, and is the corresponding real probability.
In Formula (2),
is the target classification loss, and the calculation formula is:
In Formula (5), represents the weight of the classification loss function, represents the prediction probability of the target classification, and represents the corresponding real probability.
3.3 Model Evaluation Index
The YOLO series is used for target recognition using the cross-entropy loss function (cross-entropy loss), L, AP, mean average precision (mAP), and the detection rate FPS.
The cross-entropy loss function can measure the difference between the real value and the predicted value. The smaller the value is, the better the prediction effect of the model is. The formula is as follows:
In Formula (9), N is the total number of samples, which is expressed as the true value and predicted value of the j th sample.
AP is defined as the area under the curve of precision P and recall R, and the calculation formulas of the P, R, and AP are:
In the formula, is the number of samples correctly identified, is the number of samples mis-detected, and is the number of samples missed by the model.
The mean AP values of N different categories are calculated, and the mean average precision (mAP) is obtained. As shown in Formula (13), the higher the mean average precision (mAP) value, the better the performance of the model.
In the formula, is the sum of the average precision values of each category, and is the sum of the four categories.
In the application of ground-penetrating radar, if the target can be identified in a short time, preventive and remedial measures can be taken in advance. Therefore, the detection rate is also one of the indicators to measure the pros and cons of the model. The detection rate is defined as FPS, which represents the number of images processed by the model network per second. The formula is:
In Formula (14), n is the total number of frames processed, and t is the time used for corresponding the processing.