2.2. Data Augmentation Model Based on Transfer Learning
As shown in
Figure 2, the remote sensing ship target fine-grained recognition data augmentation model presented in this paper consists of three main modules: the Simulated Image Generating (SIG) module, the Foreground Feature Translation Aligning (FFTA) module, and the Background Feature Translation Aligning (BFTA) module. Both the FFTA and BFTA modules are based on the Local-Aware Progressive Image Conversion Network, LA-CycleGAN.
The LA-CycleGAN framework comprises two generators and two discriminators . Specifically, maps from the source to the target domain , while reverses this process ; The discriminator discerns if an image originates from the source domain, while assesses its belonging to the target domain. Upon model convergence, the aim is to maximize and align the migration area distribution in the output target image with its feature distribution, denoted as . Nevertheless, discriminator occasionally encounters difficulties in accurately identifying target domain images. A local perception strategy is thus invoked to alleviate blurring and augment the accurate capture of discrete regional features and textures, augmenting the exactness and detailed representation capabilities of image transference. This local-aware progressive transfer approach employs a binary mask M to preserve the invariant region through element-wise multiplication with the source domain image. Post-migration feature mapping by the network, the unaltered area R1 is reapplied to the corresponding location in the generated image, ensuring . This method neglects the transfer operation's effects on the invariant region , circumventing irrelevant feature interference and maintaining the autonomous integrity of .
In our system, two principal components work in tandem to achieve feature transfer alignment: the background feature transfer alignment module and the foreground feature transfer alignment module. Each module is embedded with an LA-CycleGAN network, and they are configured to operate in a sequential manner. The primary objective is to synchronize the simulated image's representational distribution with that of the authentic domain while ensuring the preservation of essential textural and structural attributes.
For the background component, we endeavored to enhance the variety of oceanic backdrop style representations by amassing over 1,000 distinct remote sensing images of the ocean and harbors characterized by varied surface hues and wave patterns from multiple regions via Google Earth. During the background feature alignment phase, simulated remote sensing images define the source domain, whereas authentic oceanic backdrops form the target domain. The transitional area corresponds to the expansive scenic background of the synthetic image, and the ship target remains unchanged. Inputs for the background feature transfer alignment module encompass the genuine ocean background imagery, the synthetic remote sensing images of the source domain, and the image mask pertinent to the transitional area . It is pivotal for the background feature transfer alignment module to preserve the constancy of the ship target while aligning the background features.
The foreground feature transfer alignment module receives as its input the ship target image from the real-world domain, an intermediate image representative of the intermediate domain, and the corresponding foreground mask depicting the ship target's migration region. Within the realm of foreground alignment, the intermediate image serves as the source domain, and the real-world remote sensing ship image constitutes the target domain. Generator facilitates the conversion from the intermediate to the real-world domain. During this phase, the ship target represents the migration area , while the unchanged area corresponds to the intermediate image's background. Throughout this operation, the focus is placed exclusively on the ship target, as the ocean background is disregarded, and cycle consistency loss is computed solely for the ship target, effectively the foreground object.
2.3. Remote Sensing Ship Image Harmonization Algorithm
Figure 3 presents the overall architecture of our proposed remote sensing ship image harmonization algorithm, which employs a transfer fusion stratagem. Embracing the principles of adversarial learning, this algorithm dissects the network into four strategic components: the harmonization image generator, harmonization image discriminator, domain encoder, and domain discriminator. Specifically, the harmonization image generator, referred to as
, is entrusted with generating the harmonized image
from the provided input. The role of the harmonization image discriminator is to evaluate both the generator's output and the actual remote sensing ship imagery, enabling adversarial learning through the propagation and updating of adversarial loss. The domain encoder, utilizing the produced harmonized image
, the genuine remote sensing image, and relevant masks for both the foreground and background, generates four distinct categories of features. Subsequently, the domain discriminator identifies the correlation between the foreground and background features and communicates the corresponding adversarial loss. This results in the ultimate harmonization of the composite remote sensing ship image.
Inspired by the concept of U-Net, the generator framework proposed in this study employs a feature transfer fusion strategy and embraces a simple, symmetric encoder-decoder structure devoid of a feature normalization layer. The encoder module, when fed with remote sensing ship simulation pictures along with their corresponding image masks, processes them through a convolution layer followed by three fundamental blocks, each structured in a LeakyReLU-Conv-IN pattern. LeakyReLU, besides being instrumental in feature extraction through convolutional layers, exhibits superior resistance to saturation when compared to ReLU. This aids in mitigating the vanishing gradient problem and catalyzes network convergence. The method utilizes intra-normalization IN[15.] to handle internal covariate shifts thereby enhancing the model's generalization capabilities. Owing to the symmetric structure, every fundamental block in the decoder section incorporates an Attention RAIN module. This facilitates the transfer of statistical data from background features to normalized foreground features, uninfluenced by foreground objects. It empowers the decoder to understand correlations between spatially varied features, thereby magnifying feature significance and minimizing informational loss or blurring. Lastly, through upsampling operations, the deconvolution layer reinstates the dimensionality of the feature map to match the original input image size, which not only restores detailed information but also promotes feature fusion.
The Attention RAIN module synergistically integrates RAIN with an attention mechanism to dynamically adjust ship foreground features while preserving background elements, thereby achieving complete style harmonization and facilitating holistic feature transfer and fusion within the image. In the encoder segment, it is possible to derive the feature map
of the remote sensing composite image along with the mask
of the ship's foreground target. Taking the Lth layer within the module as an instance,
,
, and
symbolize the height, width, and channel count of features at layer L, respectively, with
denoting the remote sensing composite image's feature map at layer L; and
represents a mask of resized ship foreground objects at layer L. Initially, in the adaptive calibration phase for ship foreground features, the process diverges from RAIN’s approach of straightforward multiplication of input module's feature
by its foreground and background masks followed by normalization via IN. Instead, this module uses partial convolution on resized foreground feature
and background feature
, using AdaIN to accomplish precise alignment and calibration between ship foreground features and remotely sensed ocean background features. Moreover, to counteract the potential for positional information loss induced by partial convolution in foreground and background features, a spatial attention mechanism is introduced. The technique involves processing the remote sensing ship synthetic image feature
through a 1x1 convolution kernel and activation function to derive the spatial attention weight
. Subsequently, the weight
is multiplied and combined with
to complete the fusion of ship foreground and remote sensing ocean background details. In the final stage, a 3x3 convolution kernel is used for information fusion and dimensionality reduction to generate the image feature
after feature splicing. This methodology significantly enhances the interconnectedness between foreground and background features, serving to augment feature extraction and dimensional reduction while conservatively retaining positional data.
The remote sensing ship harmonization algorithm, grounded in the transfer fusion strategy, aims to utilize the generator to construct a harmonized image , given synthetic image and its corresponding binary mask for the foreground. The loss function of this algorithm is tripartite:
Primarily, to acknowledge the domain discrepancy between the input remote sensing synthetic ship image and the actual remote sensing ship image, the global fusion loss function
is introduced to gradually approximate the input synthetic image
to the real-world remote sensing image
.Subsequently, the harmonization image discriminator
adopts an adversarial learning approach. Its adversarial loss function, expressed as
, inputs genuine remote sensing ship samples along with harmonized synthetic ship samples to ascertain the authenticity of the image. Concurrently, a domain discriminator
is incorporated within the image harmonization operations to determine whether alignment in distribution is achieved between the foreground and background. Lastly, the algorithm invokes a corresponding loss function
for the domain transfer procedure transitioning from the foreground to the background of the harmonized image sample.
Among them, , . Due to the unavailability of a public remote sensing ship IH dataset, this chapter harnesses the general benchmark synthetic dataset iHarmony4[16.] to train the image harmonization model. To differentiate images at varying processing stages, images produced by the remote sensing ship image harmonization algorithm, predicated on the transfer fusion strategy proposed in this section, are designated as harmonized images. The algorithm creates a harmonious linkage between two independent operational phases: the background feature transfer alignment module and the foreground feature transfer alignment module. By adjusting features such as foreground brightness representation, it ensures that the entire aligned image visually resembles actual images optimally.