1. Introduction
People are now easily able to upload photos and illustrations to the Internet, owing to smartphones and personal computers. To protect content creators, we need to prevent unauthorized copying and other abuses because digital content is not degraded by copying or transmissions. Digital watermarking is effective against such unauthorized use.
In digital watermarking, secret information is embedded in digital content by making slight changes to the content. In the case of an image, the image in which the information is embedded is called a stego-image, and the embedded information is called a digital watermark. There are two types of digital watermarking: blind and non-blind. The blind method does not require the original image to extract the watermark from the stego-image. However, the non-blind method requires the original image when extracting a watermark from a stego-image. Therefore, the blind method is more practical. In addition, because stego-images may be attacked by various kinds of image processing, watermarking methods must have the ability to extract watermarks from degraded stego-images. Two types of attacks on stego-images can occur: geometric attacks such as rotation, scaling, and cropping and non-geometric attacks such as noise addition and JPEG compression [
1].
Neural-network-based methods have been proposed. In single-stage-training, where the embedding and extraction are performed in a single network, the network has been trained to output a watermark from an input image [
2,
3]. The overall performance of the network is low because the relationship between the image and the watermark is trained individually. To improve performance, watermarking methods using autoencoders (AE) have been proposed [
4,
5,
6]. The input layer to the middle layer is called the embedding network, and the middle to the output layer is called the extraction network. Both the original image and the watermark are input into the input layer of the AE, and the identity mapping is learned to retrieve them in the output layer. The stego-image is extracted from the middle layer [
6]. Since the original image is unnecessary during extraction, it is often omitted to output only the watermark. Furthermore, AE with convolutional neural networks has been proposed [
7]. An adversarial network has also been added to improve image quality [
8]. DARI-Mark [
9] is a DNN-based watermarking method using attention to determine the embedding regions. It can find non-significant regions that are insensitive to the human eye and increases robustness by embedding the watermark with larger intensities. Thus, end-to-end models were proposed [
6,
7,
8,
9,
10,
11]. However, a huge training dataset was needed to train the connections as the network became more complex. Although data augmentation was sometimes introduced, a model with internal networks mimicking attacks was proposed in order to train on a relatively small training dataset [
8,
10,
11].
The HiDDeN [
8] proposed by Zhu
et al. has an attack layer that simulates attacks such as Gaussian blur, per-pixel dropout, cropping, and JPEG compression attacks on images during training. Here, the implementation of JPEG compression is approximated by JPEG-Mask, which sets the high-frequency components of the discrete cosine transform (DCT) coefficients to zero, and JPEG-Drop, which uses progressive dropout to eliminate the high-frequency components of the DCT coefficients. Therefore, this implementation does not meet the standard for quantization in the JPEG compression process. It has also been noted that the JPEG-Mask and JPEG-Drop layers of HiDDeN do not provide sufficient performance for the robustness of the JPEG compression [
12,
13]. JPEGdiff is a method of approximating around the quantized values in JPEG compression by a cubic function. Hamamoto and Kawamura’s method [
10] also introduces a layer of additive white Gaussian noise as an attack layer to improve robustness against JPEG compression. Moreover, ReDMark proposed by Ahmadi
et al. [
11] has attack layers implementing salt-and-pepper noise, Gaussian noise, JPEG compression, and mean smoothing filters. The quantization of the JPEG compression is approximated by adding uniform noise. As described, the quantization process has been replaced by the process of adding noise, and the quantization process as per the JPEG standard has not been introduced.
Adversarial samples are a problem in the field of pattern recognition. They are generated by adding distortions to images to misclassify it. To avoid misclassification, a pattern recognition method using JPEG compressed images has been proposed [
15]. JPEG compression is expected to effectively reduce noise while preserving the information needed for pattern recognition. However, JPEGdiff has been proposed as a way to break this technique [
12]. By approximating the JPEG quantization with a differentiable function, a JPEG-resistant adversarial image can be generated. Therefore, approximating the JPEG quantization with a smooth function may affect the performance of the model.
In our previous work [
14], we proposed a quantization activation function (QAF) that can simulate the quantization of JPEG compression according to a standard. That model consists of a network that introduces the QAF into the AE-based model proposed by Hamamoto and Kawamura [
6]. Better performance was obtained in terms of JPEG compression robustness than the AE-based model [
6]. The effectiveness of the QAF has been demonstrated in our previous work. However, in that model, QAF with a constant quantization width were used instead of the quantization table. In this paper, we apply the QAF to the attack layer of the ReDMark [
11], which is a CNN-based model, rather than an AE-based model. Furthermore, the proposed method uses the quantization table-based QAF. The robustness against the JPEG compression is expected to be improved using the QAF. The effectiveness of our method is evaluated by comparing JPEG-compressed images with QAF-applied images. The image quality of the stego-image was also evaluated.
The rest of the paper is organized as follows. In
Section 2, the process of JPEG quantisation is explained. In
Section 3, we describe the ReDMark and, in addition, we address our previous work. In
Section 4, we define the quantized activation function and describe the structure of the proposed network. In
Section 5, we show the effectiveness of the function and demonstrate the performance of our network in computer simulations. The last section concludes the paper.
Figure 1.
JPEG compression quantization for luminance components. 1) Creation of the quantization table , 2) The quantization process, 3) The dequantization process.
Figure 1.
JPEG compression quantization for luminance components. 1) Creation of the quantization table , 2) The quantization process, 3) The dequantization process.
2. Preliminary: JPEG Quantization
JPEG compression is a lossy compression that reduces the amount of information in an image to reduce the file size. In this kind of compression, an image is divided into
-pixel blocks. Then, in each block, the processes of the DCT, quantization, and entropy coding are sequentially performed. In JPEG compression, the process of reducing the amount of information is the quantization process of the DCT coefficients. We focus on the quantization of the DCT coefficients of the luminance component in an image because the watermark is embedded in these coefficients of the image.
Figure 1 shows the quantization process in JPEG compression for DCT coefficients [
16]. The process consists of three steps: 1) creation of the quantization table
, 2) the quantization process, and 3) the dequantization process.
During the quantization process, the DCT coefficients are quantized based on a default basic table or a self-defined basic table. The default basic table
is defined as
The quantization table is then determined using the quality factor (
Q) and the basic table
. The quantization table
for the quantization level
Q at coordinates
is defined as
where
is the floor function and where
is the
component of the basic table
. Also, the scaling factor
is given by
The quantization process is performed using the quantization table
. Let
be the quantized data, and let
be the DCT coefficients in an
-pixel block. The quantization process is performed as
where
Let
be the quantized DCT coefficients; then, the dequantization process is performed as
Author Contributions
Conceptualization, M.K. ; methodology, M.K. and S.Y.; software, S.Y.; investigation, S.Y.; resources, M.K.; data curation, M.K. and S.Y.; writing—original draft preparation, S.Y.; writing—review and editing, M.K.; visualization, S.Y.; supervision, M.K.; project administration, M.K.; funding acquisition, M.K. All authors have read and agreed to the published version of the manuscript.