Abstract
Small object detection has always been a challenging task in the field of remote sensing image detection. Due to the small proportion and limited pixel size of small objects in images, especially when using Convolutional Neural Networks (CNNs), the downsampling operation in traditional algorithms often leads to the loss of detailed information of small objects, causing missed detection issues. To address this problem, this paper proposes an improved YOLOv8 algorithm. We designed an adaptive feature extraction and multi-scale fusion module, which enhances the expressive ability of features and effectively extracts the detailed information of small objects. We incorporated the AFGCAttention attention mechanism to strengthen the network's focus on key regions, suppress irrelevant background information, and improve the model's ability to recognize small objects. To overcome the resolution loss problem in small object detection, we adopted the CARAFE (Content-Aware ReAssembly of FEatures) upsampling operator. By reorganizing feature maps with content-awareness, it avoids the blurriness and information loss commonly found in traditional upsampling methods, especially showing significant advantages in the reconstruction of small object details, making their boundaries clearer and more accurate. Meanwhile, to improve the accuracy of bounding box regression, we combined the GIoU loss function to optimize the geometric shape matching between the target and predicted boxes, solving the issue of inaccurate bounding box localization in small object detection and improving localization precision. Experimental results show that the proposed algorithm achieves significant accuracy improvement in small object detection tasks, maintaining high detection robustness in complex background scenes. The improved model reaches a mean average precision (mAP) of 83.0% and an accuracy of 85.0%, which is an increase of 2.0% and 5.0%, respectively, compared to the baseline model. Compared with existing methods, this approach has significant advantages in detection accuracy, localization precision, and model computational efficiency, especially demonstrating outstanding performance in small object detection.