Synthetic Aperture Radar (SAR), with its all-weather, all-day, weather-independent imaging characteristics, has become one of the most important tools for terrestrial observation. It transmits electromagnetic pulses to the target area through an antenna, receives electromagnetic pulses back from the target area, compares the received and transmitted electromagnetic pulses, and generates images by the Doppler effect.SAR operates in an electromagnetic waveband that penetrates clouds and dust, which allows it to provide remote sensing images in complex weather environments. With airborne and satellite-based SAR, it is possible to obtain high-resolution SAR images of the ocean, and ship targets as well as the ship’s tracks are clearly visible in these images. Therefore, ship detection systems using SAR have been widely used in maritime surveillance activities and play an increasingly important role [
1,
2,
3]. Among the ship target detection methods, Constant False Alarm Rate (CFAR) is one of the classical algorithms widely used for ship target detection, which detects ship targets by modelling the statistical distribution of background clutter [
4].This traditional algorithm is suitable for SAR images with simple backgrounds and does not process well in images with complex backgrounds. In 2012, AlexNet, proposed by Alex Krizhevsky et al. [
5] made a splash in the ImageNet image recognition competition, crushing the classification performance of the second place support vector machines (SVM). After this, convolutional neural networks (CNNs) have received renewed attention. It has been easy to encounter the problem of gradient disappearance in CNNs. In 2015, Kaiming He et al. proposed ResNet [
6], a network with a residual block that alleviates the gradient disappearance and has had a profound impact on the design of subsequent deep neural networks. With the development of deep learning, current deep learning algorithms have far surpassed the performance of traditional machine learning algorithms. Applying deep learning to image processing can significantly improve detection accuracy and speed for tasks such as target detection and instance segmentation [
5]. Currently, many authors have applied deep learning to SAR ship target detection. Kang et al. [
7] proposed a multilayer fusion convolutional neural network based on contextual regions and verified that contextual information has an impact on the performance of the neural network in recognising ships in SAR images. Jiao et al. [
8] fused features of different resolutions through dense connections for solving the multi-scale and multi-scene SAR ship detection problem. Cui et al. [
9] integrated feature pyramids with convolutional block attention modules to integrate salient features with global unambiguous features to improve the accuracy of ship detection in SAR images. To improve the detection speed of SAR image ship, Zhang et al .[
10] proposed a high-speed ship detection method for SAR images based on grid convolutional neural network (G-CNN). Qu et al. [
11] proposed an anchor freed detection model based on mask-guided features to reduce computational resources and improve the performance of ship detection in SAR images. Sun et al. [
12] proposed a model based on a densely connected deep neural network with an attention mechanism (Dense-YOLOv4-CBAM) to enhance the transmission of image features. Liu et al. [
13] based on YOLOv4, through feature pyramid network (FPN) [
6] to obtain multi-scale semantic information and use scale-equalizing pyramid convolution (SEPC) to balance the correlation of multi-scale semantic information, and proposed SAR-Net. Wang et al. [
14] added multi-scale convolution and transformer module to YOLO-X to improve the performance of YOLO-X in detecting ships. In the FBR-Net network proposed by Fu et al. [
15], the designed ABP structure uses a layer-based attention approach and a spatial attention approach to balance the semantic information of the features in each layer, making the network more focused on small ships. In order to improve the detection of small ships in complex background SAR images, Guo et al. [
16] combined feature refinement, feature fusion and head enhancement methods to design a high-precision detector called CenterNet++. Considering that contextual information is crucial for the detection of small and dense ships, Zhao et al. [
17] proposed a new CNN-based method in which as many small ships as possible are first proposed and then combined with contextual information to exclude spurious ships from the predictions, improving the accuracy of ship detection in SAR images.
All of the above researches have contributed to the improvement of the accuracy of ship target detection in SAR images, but the following problems still exist:
1. Most of their SAR image ship target detection frameworks are designed for small target ships in SAR images, and in the process of designing, the performance of recognizing multi-scale ships is not well considered.Therefore, the detection accuracy decreases for the presence of multi-scale ships in the SAR image.
2. Some networks use complex feature fusion in the Neck part, and it is the fusion of features extracted from high level convolutions of the backbone network, while the semantic details about the ship extracted from low level convolutions are easy to be “drowned out” due to the stacking of the convolutions, which is not friendly to ship detection.
1. A new convolutional block, which we name AMMRF, is proposed. for SAR images containing ships, it obtains feature information from different sensory fields and filters this feature information, making the network more focused on information useful for ship detection.
2. The addition of AMMRF to the backbone network of YOLOv7 makes the backbone network more dexterous.The addition of AMMRF makes the whole detection framework complete with feature fusion in the backbone network. Therefore, we modified the Neck part of YOLOv7 by removing the complex feature fusion. We named the new detection framework as YOLO-SARSI.
3. The number of parameters in YOLO-SARSI is very small, only 18.43M, which is 16.36M less compared to YOLOv7. Even so, the accuracy of YOLO-SARSI in SAR images of ship targets is still higher than that of YOLOv7.