2.2. Analysis of Watermarking Techniques
Based on the embedding domains as well as the accuracy requirements of the watermarked data, watermarking techniques are divided into several categories such as spatial domain-based watermarking, frequency domain-based watermarking, reversible watermarking, lossless watermarking, and zero-watermarking. Research on watermarking using techniques of embedding information into the spatial domain (coordinates of objects) can be mentioned as [
3,
4,
5,
6,
7,
8,
9], techniques of embedding information into the frequency domain such as [
10,
11,
12]. Reversible watermarking techniques for vector geographic data are performed in [
13,
14,
15,
16,
17,
18]. In [
19,
20], the studies of watermarking without losing information based on the storage order are presented. Zero-watermarking techniques are studied in [
21,
22,
23,
24,
25,
26].
Watermarking algorithms often try to embed some information (watermark) into the data for the purpose of copyright protection by serving to detect and extract the watermark to prove the copyright of the owner. The embedding is based on the principle of finding redundant space in the data so that the watermark can be inserted without or with little effect on the data to be protected. The techniques for inserting or hiding this information into other information mainly used today include spectral modulation, least significant bit, or quantization index modulation techniques [
27]. Note also that the watermark does not have to be embedded in the carrier data, and techniques of this kind are called zero-watermarking.
Based on the degree of stability of the watermark against attacks on the watermarked data, the watermarking algorithms can be classified into robust and fragile watermarking. Robust watermarking is useful for copyright protection while fragile watermarking is useful for integrity and authenticity issues. Based on the ability to see the watermark visually on the watermarked data, watermarking techniques are classified into visible and invisible watermarking. For multimedia data such as images or videos, visible watermarking is an effective way to explicitly identify data copyright. However, for vector geographic data, besides serving visual presentation, the data also serves many spatial analyses, therefore, the watermarking techniques on vector map data are mainly invisible watermarking, which is not seen by users. Based on the techniques of embedding watermarks into data, it is possible to divide watermarking algorithms into spatial domain-based watermarking and frequency domain-based watermarking. Spatial domain-based watermarking seeks to insert watermarks into the vertex coordinates or relations of geometric features, while frequency domain-based watermarking first, seeks to transfer geometric features from the spatial domain to the frequency domain using transformations such as Fourier transformation, Cosine transformation or wavelet transformation, then insert watermark into the coefficients, and finally convert back to the space domain using the corresponding inverse transformations. Based on whether original data is needed to verify watermarked data, watermarking algorithms are divided into blind and non-blind watermarking. With blind watermarking, extracting watermark information only needs the watermarked data itself, while with non-blind watermarking, extracting watermark information requires both original data and watermarked data.
Based on the influence as well as the ability to recover the original data from the watermarked data, watermarking algorithms are classified into lossy watermarking and lossless watermarking. Conventional watermarking techniques distort the original data and, in some cases, may be unacceptable (e.g., in geodesy, cartographic, military fields). Lossless watermarking attempts to preserve the original data and as such is suitable for copyright protection applications of highly accurate vector geographic data.
As analyzed in [
19], the current studies of lossless watermarking for vector geographic data can be divided into three main categories. The first type is reversible watermarking which protects copyright by embedding watermark in the original data and it is possible to recover the original data from the watermarked data through watermark extraction. Unfortunately, the reversible watermarking technique has obvious defects that the watermark can only be used once because the reversible watermark information must be discarded after the watermark extraction. Therefore, this technique does not satisfy the requirement of copyright protection.
The second lossless watermarking technique is zero-watermarking, where the watermark is generated from the characteristics of the data without any modification to the original data. The generated watermark will be stored in an IPR (Intellectual Property Rights) repository for future proof of copyright. The main point of the method is to extract stable characteristics of the original data that are resistant to various types of attacks. There are two popular methods of extracting features that are based on statistical and geometrical features, respectively. For example, the map is divided into rings using concentric circles. The number of vertices in each belt is then counted to be used as information on statistical feature. These characteristics are further combined with copyright information to form zero-watermarks. Compared with reversible watermarking, zero-watermark achieves complete lossless information. However, the technique does not actually embed the watermark in the data, so there may be issues of trust in third parties in the proof of copyright.
The third technique is watermarking based on storage features. This method embeds the watermark by transforming the storage order of the vector data without changing the coordinate values, thus avoiding the defect of single-use limitation in reversible watermarking and avoiding the limitation of replying on a third-party in zero-watermarking. With this technique, the storage direction of a polyline is quantized by 0 or 1 corresponding to the storage characteristic of that polyline. To embed watermark, the watermark bit is considered whether consistent with the quantum value of the storage direction. If they are the same, the storage order of that polyline does not change. Otherwise, the storage order of that polyline is reversed. Compared with zero-watermarking, this technique actually embeds the watermark in the data.
The watermark itself can be classified into two groups as meaningful watermark and meaningless watermark. Meaningful watermarks are usually logo images that are easily visually inspected when extracted, while meaningless watermarks are often represented as a pseudo-random sequence of bits. Detecting the presence of meaningless watermarks often uses statistical correlations (e.g., the Pearson correlation coefficient). In addition, meaningless watermarks are usually much shorter than meaningful watermarks, so meaningless watermarks increase the robustness of watermarking techniques and are often used for small data sets. To improve the robustness of watermarking techniques and to ensure the security of the watermark, the watermark is often scrambled and can be encrypted by a cryptosystem before being embedded in the data. A contribution of this study is proposing a technique to generate a meaningful watermark with characteristics of a meaningless watermark, so can be detected and verified automatically with high accuracy. The generated watermark is also meaningful watermark, so can be verified visually by humans.
2.2.1. Spatial Domain-Based Techniques
The techniques use the spatial domain to embed watermark bits into the coordinates of vertices or other features such as distances, angles, or even the storage order of vector geographic data. To get a robust watermarking algorithm against common attacks such as feature deletion/addition, feature compression (simplification), vertex deletion/addition, algorithms try to embed watermark bits multiple times on the data and thus it is necessary to establish a one-to-many mapping of watermark bits to the vertices. For example, when we want to embed a watermark bit, the algorithm needs to establish a mapping that maps this bit to multiple vertices. This mapping needs to ensure that the watermark bits are embedded “uniformly” over the entire data set to prevent attacks such as data clipping. One contribution of this study is the construction of such a mapping.
Embedding watermark bits into vertices in vector geographic data can be done by one of the techniques such as least significant bit, quantization index modulation, or storage order.
2.2.2. Frequency Domain-Based Techniques
The frequency domain-based techniques organize the whole data or each feature (polyline or polygon) as a sequence of coordinates. Coordinate sequences are then converted to the frequency domain using one of the numerical transformations such as Fourier, Cosine, or Wavelet. The watermark bits will be embedded in the frequency domain coefficient values and then converted back into the space domain by the corresponding inverse digital transformations. In this way, the watermark bit is embedded and propagated into the coordinate sequence.
To ensure robustness, techniques using the frequency domain also use one-to-many mappings from the watermark bits into coordinate sequences as in the spatial domain-based technique. In general, techniques using frequency domain have higher stability than spatial domain techniques, but the accuracy of data after embedding watermark is lower.
2.2.3. Lossless Watermarking Techniques
Lossless watermarking techniques cannot embed watermark bits directly into the values of geometric objects in the data (whether directly or using frequency transformation) because this would reduce the accuracy of the data. Instead, these techniques seek to utilize the special characteristics of vector geographic data. A typical example is using the ordering of the vertices of a polyline or polygon to store a bit of information; for example, if the order is clockwise then the information bit is 1 and vice versa if the order is anti-clockwise then the information bit is 0.
2.2.4. Zero-Watermarking Technique
Zero-watermarking techniques actually do not embed watermarks in the data but are watermark generating techniques. Watermarks are generated based on the immutable characteristics of the data to ensure against attacks such as projection and CRS transformation. The generated watermark is a meaningless watermark that tries to characterize the data. This meaningless watermark may be saved by a trusted third party for future reference purposes. Some watermarking techniques also add the step of combining this meaningless watermark derived from data with copyright information to form a meaningful watermark.