The approach in this article draws on many concepts and methods from the field of image information hiding; therefore, our analysis of related work is confined to the domain of information hiding. The origins of information hiding technology can be traced back to ancient covert means of communication, such as invisible ink and miniaturized fonts. With the development of time, information hiding has gradually changed from traditional physical means to digital technology. Based on the application context, information hiding can be divided into digital steganography and digital watermarking. Generally speaking, the former is mainly used for covert communication, while the latter is used for copyright protection. According to such categorization, our DIH4RSID is more closely related to the field of digital steganography. The research of steganography can be roughly divided into three stages. Early information hiding technologies were mainly based on non-adaptive hiding strategies, among which the most typical representative is LSB (Least Significant Bit) [
13]. LSB is a steganography method that modifies and stores information based on the least significant bit of an image. Using the insensitivity of human eyes to color differences, the secret information is put into the least significant bit of the picture by a certain embedding method, so that the information we need to hide is put into the least significant bit of the picture by a certain method. Because non-adaptive steganography does not consider the characteristics of the cover image itself, it is not safe and easy to be detected and analyzed. Based on this, adaptive steganography came into being, representing the second stage of steganography. Adaptive steganography considers the properties of the cover image itself, such as texture information and edge information of the image content. According to the characteristics of difficult detection of complex areas of image texture, secret information is selectively embedded into areas with complex textures or rich edges of the cover, which improves the anti-steganographic detection ability of loaded images. At the same time, all kinds of adaptive steganography algorithms are combined with STC [
14] encoding methods, the difference is that the distortion function is different. Such algorithms are represented by HUGO [
15], WOW [
16], UNIWARD [
17] and HILL [
18]. Although the adaptive steganography methods have achieved high performance, they are confronted with several challenges for both content adaptive based and statistics-based approaches. Firstly, Such algorithms can only embed a small number of bits or text information, and cannot embed multimedia information such as images [
12]. At the same time, these methods often require specialized knowledge to design elaborate distortion cost functions. With the continuous development of analysis algorithms based on deep learning, the security of these traditional human-designed information hiding algorithms faces great challenges. This makes researchers begin to turn their attention to deep learning, attempting to use deep learning’s powerful feature fusion ability to realize information hiding. Frameworks for information hiding based on deep learning, such as HiDDeN [
19] and SteganoGAN [
9], have been developed to accomplish the tasks of hiding and extracting information. This development signifies the progression of information hiding into its third phase, known as deep information hiding. These frameworks eliminate the need for manual design of embedding strategies and achieve higher payloads. However, they still only enable the covert transmission of small amounts of data. To address the challenge of hiding large image data, Baluja [
20] presented a system to embed a full-color image into another of identical size while minimizing the quality degradation of both images. This is achieved by concurrently training deep neural networks to carry out both the embedding and extraction processes, which are specifically tailored to function in tandem. While this approach represents a significant innovation and yields impressive visual results, its robustness against analytical attacks leaves something to be desired. Rehman et al. [
8] endeavored to develop an encoder-decoder architecture rooted in convolutional neural networks, accomplishing complete network training through the adoption of a novel loss function. While this approach proficiently conserved the fidelity of the concealed image, the visual quality of the crafted stego-image was subpar. In addition, in order to further improve the hiding performance, Duan et al. [
21] introduced a reversible information concealing network that utilizes a U-Net architecture. The approach yielded pleasing outcomes in synthesizing concealed images as well as in the accurate retrieval of secret images. Nonetheless, their research did not delve into an in-depth examination of security concerns. Chen et al. [
12] posited that certain secret images might possess intricate spatial characteristics. To address this, he proposed a multi-tiered robust auxiliary module aimed at augmenting the feature representation, subsequently elevating the restoration quality of secret images. However, due to the absence of a discriminator within the framework, the enhancement in performance was not markedly evident. Some researchers introduce the attention mechanism into the field of deep information hiding [
10,
22,
23,
24], and promising results have been achieved. Tan et al. [
23]. propose a new end-to-end image network architecture based on a channel-attention mechanism that generates adversarial networks. Steganography can produce perceptively indistinguishable steganographic images of different capacities. However, their programs cannot be used to directly embed and carrier such large remote sensing images. According to the above analysis, the existing method can not be directly applied to the safe distribution of remote sensing images, so it needs to be further modified to adapt to this task.