4.3. Data
The experimental dataset in this study consisted of 3000 crack pictures captured by the UAV, which were divided into training and test sets in a 9:1 ratio.
In the preprocessing stage, part of the training set was augmented to improve the generalisation ability of the algorithm model. The image transformation method adopted in the dataset enhancement was still close to the tunnel crack image collected after image processing, including random brightness transformation, random horizontal flipping, and random vertical flipping. The transformation results after processing are shown in
Figure 7.
The image was scaled and standardised before being input into the network. The widths and heights of the scaled images were 512. The mean values of the standardized RGB three-channel are (123.675,116.28,103.53), and the standard deviation is (58.395,57.12,57.375).
CenterNet determines the target's location by predicting the target centre point, target centre point bias, and target size. Therefore, the corresponding labels of the image include the target centre-point Gaussian heat map, target centre-point bias, and target size, which are represented by a tensor of the same size as the network output.
4.4. Training Process and Experimental Results
To ensure the real and effective results of the comparative experiments, the training parameters used in all the experiments involved in this study were completely consistent. The initial learning rate of the training was 0.0001. The cosine annealing learning rate adjustment method was adopted, and the minimum learning rate was 0.00001. The batch size was set to 8 during the training process. A total of 300 epochs were trained using the SGD optimisation algorithm.
The training experiments were conducted in five groups: original CenterNet with the backbone network of ResNet18, CenterNet with the channel space attention mechanism, CenterNet with the feature selection module, CenterNet with target size loss improvement, and CenterNet with the above three improvements.
Table 2 compares the performance of CenterNet with the addition of CBAM and feature-selection modules, including FLOPS, FPS, and video memory.
In the data training process, owing to the different difficulties in data feature extraction, there are overlaps and omissions in some data, as shown in
Figure 8. Given this situation, the optimised model in this study adopts the method of strengthening the feature extraction. This situation changed significantly after adding the feature extraction module, and the data processing accuracy was effectively improved.
When the test environment of the controlled experiment was the same as that of the training environment, the batch size of the experiment was set to one. The ablation experiments are summarised in
Table 3. From the ablation experiment, the following results were obtained:
After the CBAM module was added, the model size increased by 1.4MB, FPS decreased by 106.6, video memory increased by 2MB, FLOPs remained unchanged, and AP increased by 0.072.
After adding the feature selection module, the model size increased by 0.8MB, FPS decreased by 46.3, video memory increased by 58MB, FLOPs increased by 3.29, and AP increased by 0.101 compared with the original model.
After IOU optimization in the original model, the size increased by 0.5MB, FPS decreased by 123.7, video memory increased by 31MB, FLOPs increased by 2.2, and AP increased by 0.021 compared with the original model.
After the addition of the feature selection module, the optimised model decreased the target size loss faster than the original CenterNet because the feature selection module can adaptively select the underlying features (such as the target texture and edge information) in the downsampling process and add them to the feature map in the upsampling process. Thus, the target size can be learned more quickly.
The change curve of the CenterNet target size loss after the original CenterNet and the addition of the feature selection module are shown in
Figure 9.
The feature selection module can adapt to underlying features, which is also evident in the actual detection effect. As shown in
Figure 10, after adding the feature selection module, the optimised model can predict the crack size more accurately owing to the inclusion of information such as the crack edge.
The CBAM and feature selection modules, particularly the CBAM module, significantly impact the reasoning speed of the network. This is because, after the CBAM module is added to each ResBlock, the FPS of the network decreases overall, whereas the feature selection module reduces the FPS. Regarding the video memory usage, the impact of the two additional modules was relatively small.
The feature information of the entire network is compressed by the subsampling module, which reduces the workload of subsequent network training and increases the reasoning speed of the entire network. The input information in the upper layer of the network is enhanced after the feature extraction module, and the upsampling stage uses fewer convolutional layers to improve the running speed of the network. The information on each input and output layer of the overall network optimised in this study is shown in
Table 4.
To demonstrate the performance improvement of the model before and after optimisation more intuitively, five groups of training processes were randomly selected for comparison, as shown in
Figure 11. Dark blue represents the data processing accuracy of the original CenterNet model, and yellow represents the improvement in accuracy brought about by the optimised CenterNet-CBAM-FS-IOU model.
After optimisation, the overall processing accuracy of CenterNet improved to a certain extent, and it could effectively identify cracks in construction concrete with less training time. The actual detection effect is shown in
Figure 12, where the red box represents the detection crack prompts, and the number represents the detection number.
As a classic anchor-free model in the field of computer vision, the CenterNet model has a wide range of applications and optimisations in various disciplines.
Table 5 lists the AP values of CenterNet and the improved model structure for the dataset. The improvement in the AP values also verifies the effectiveness of the proposed algorithm model optimisation scheme.