1. Introduction
With the continuous breakthrough of remote sensing technology, the ways to obtain information through multi-source satellites, unmanned aerial vehicles, and other sensors have gradually increased, the space-air-ground integrated system based on multi-sensors has gradually improved. Due to the different methods and principles of data acquisition by different sensors, the acquired Earth observation data have their own information advantages, and the edge intelligence greatly improves multi-source data transmission and processing speed [
1]. In order to integrate the multi-source remote sensing data to achieve complementary advantages and improve the data availability, information fusion of remote sensing images is usually required. Multi-source image registration refers to the technology of matching images in the same scene obtained by different image sensors and establishing the spatial geometric transformation relationship between them. Its accuracy and efficiency will directly impact the subsequent tasks, such as remote sensing interpretation in land-use surveillance[
2].
However, due to the large differences in the operating mechanisms and conditions of imaging sensors, multi-source remote sensing images often show significant variations in the geometric and radiometric features. These obvious differences between multi-source remote sensing images are reflected in the registration process as the same landform or target in the image presents distinct image features, especially internal texture features. Therefore, algorithms such as SIFT proposed for homologous registration often fail [
3]. Traditional multi-source image registration methods can be divided into two categories: area-based and feature-based methods. Region-based methods often use similarity measures and optimization methods to accurately estimate transformation parameters. A typical solution is to evaluate similarity with MI [
4], which is generally limited by high computational complexity. Another approach is to improve processing efficiency through the domain transfer techniques such as fast Fourier Transform (FFT) [
5]. However, facing the large-scale remote sensing images, region-based methods are sensitive to severe noise caused by imaging sensors and atmosphere, making it difficult to be applied widely. Therefore, feature-based methods are more commonly used in remote sensing field. These methods usually detect significant features between images (such as points, lines, and regions) and then identify correspondence by describing the detected features. The PSO-SIFT algorithm optimizes the calculation of image gradients based on SIFT and exploits the intrinsic information of feature points to increase the number of accurate matching points for the problem of multi-source image matching [
6]. OS-SIFT is mainly focused on SAR images and optical images matching, and uses the multiscale exponential weighted average ratio and multiscale Sobel operator to calculate the SAR image and optical image gradients to improve the robustness [
7]. The multiscale histogram of local main orientation (MS-HLMO) [
8] develop a basic feature map for local feature extraction and a matching strategy based on Gaussian scale space. The development of deep learning technology has accelerated the research of computer vision, and more and more learning-based methods have emerged in registration field [
9]. A common strategy is to generate deep and advanced image representations, such as using the network to train feature detectors[
10], descriptors[
11], or similarity measures, instead of traditional registration steps. There are also many methods based on Siamese networks or pseudo-Siamese networks[
12], which obtain the deep feature representation by training the network [
13], and then calculate the similarity of the output feature map to achieve template matching. However, there are few available multi-source remote sensing datasets and the acquisition cost is high, which greatly limits the development of learning-based methods.
These early feature-based methods are mainly based on gradient information, but the application in multi-source image registration is often limited as the sensitivity to illumination differences, contrast differences, etc. In recent years, as the concept of phase congruency (PC) [
14] is introduced into image registration, more and more registration methods based on it show superior performance. The histogram of orientated phase congruency (HOPC) feature descriptor achieves matching between multiple modal images by constructing descriptors from phase congruency intensity and orientation information [
15]. Li et al proposed a radiation change insensitive feature transformation (RIFT) algorithm to detect significant corner and edge feature points on the phase congruency map and constructed the max index map (MIM) descriptor based on the Log-Gabor convolution image sequences [
16]. Yao et al proposed a histogram of absolute phase consistency gradients (HAPCG) algorithm, which extended the phase consistency model, established absolute phase consistency directional gradients, and built the HAPCG descriptors [
17]. Fan et al proposed a 3MRS method based on a 2D phase consistency model to construct a new template feature based on Log-Gabor convolutional image sequences and use 3D phase correlation as a similarity metric [
18]. In general, although these phase-congruency-based registration methods demonstrate excellent performance in multi-source image matching, there are still two problems for remote sensing images registration: (1) the interference of noise and texture on feature extraction cannot be avoided; (2) the computation of phase congruency involves Log-Gabor transforms at multiple scales and directions, which is heavy in computation and results in the extension of feature detection time.
The key to automatic registration of multi-source images lies in how to extract and match feature points between heterogeneous images. Since edges are relatively stable features in multi-source images, which can maintain stability when the imaging conditions or mode change [
19], feature extraction based on edge contour information can greatly enhance the registration performance. As shown in
Figure 1, for the same area, terrain selected by the red box exhibits significant differences in multi-source images, while the edge marked with the green box tends to be more consistent.
Therefore, a multi-source remote sensing image registration method based on homogeneous features is proposed in this paper, namely edge consistency radiation-variation insensitive feature transform (EC-RIFT). Firstly, in order to suppress speckle noise without destroying edge integrity, non-local mean filter is performed on SAR images, and edge enhancement is applied to visible images with abundant texture to help extract coherent features. Secondly, image edge information is extracted based on phase congruency, and orthogonal Log-Gabor filter (OLG) is chosen instead of global filter algorithm for feature dimension reduction, speeding up the feature detection. Finally, the EC-RIFT algorithm constructs descriptors based on the sector neighborhood of edge feature points, which is more sensitive to nearby points and improves the robustness of the descriptors, thus improving the registration accuracy. The main contributions of our paper are as follows.
- 1.
To capture the similarity of geometric structures and morphological features, phase congruency is used to extract image edges, and the differences between multi-source remote sensing images are suppressed by eliminating noise and weakening texture.
- 2.
To reduce computing costs, phase congruency models are constructed using Log-Gabor filtering in orthogonal directions.
- 3.
To increase the dependability of descriptor features with richer description information, sector descriptors are built based on edge consistency features.
The rest of the paper is organized as follows:
Section 2 introduces the proposed method in detail.
Section 3 evaluates the proposed method by comparing experimental results with other representative methods.
Section 4 discusses several important aspects related to the proposed method. Finally, the paper is concluded in
Section 5.
3. Experiments and Results
In this section, we first introduced the datasets and evaluation metrics. Subsequently, we investigated the effectiveness of the proposed method in stages. Finally, aiming at proving the superiority of the EC-RIFT, the results were presented and evaluated by comparing them with three representative multi-source image registration methods.
3.1. Datasets
Two types of datasets are used to evaluate the registration performance of the EC-RIFT algorithm in this paper. The first is formed from 7 pairs of SAR images and optical images with various sizes and sources. The image pairs in other modalities are also employed to assess the generalization ability of proposed algorithm. Both datasets are described in detail.
The SAR and optical images are obtained from different sensors, such as Terra SAR-X, Gaofen satellites, Google Earth, Landsat satellites and aerial vehicles, covering the various resolutions and sizes. The multi-source image dataset consists of infrared-optical, day-night, depth-optical pairs, widely employed in remote sensing image registration. All image pairs capture various terrains, including urban areas, fluvial deposits, rivers and farmland. And images exhibit significant geometric and radiometric differences, posing challenges for registration. The data are shown in
Figure 6, in which (a) - (f) denote the sar-optical data, and (g) - (i) present multi-source data employed in the experiments.
3.2. Evaluation Criterion
For quantitative evaluation, the repeatability [
27], root mean square error (RMSE), and running time are selected to evaluate the performance of the proposed method.
Repeatability [
27] is often used as a metric to evaluate the quality of detected features, which is defined to represent the rate of potential corresponding points in all detected points. The repeatability can be calculated as equation (
14):
Where
represents the number of potential corresponding points,
and
represent the total number of feature points detected in the reference image and moving image, respectively. If a feature point meets the conditions as formula (
15), it is considered as a potential corresponding point. Where i represents the index of keypoint,
is the keypoint coordinates of the reference image, and
is the keypoint coordinates obtained after affine transformation of the moving image.
The root mean square error (RMSE) is used to evaluate the precision of the registration results. RMSE is defined as (
16):
Where n is the number of matching pairs, is the keypoint coordinates of the reference image, and is the keypoint coordinates obtained after affine transformation of the image to be registered. The smaller the value of RMSE, the higher the precision of registration.
3.3. Registration Results and Analysis
To demonstrate the effect of each processing section on the registration results, experiments are designed to evaluate the registration results. And the comparison experiments were conducted with HAPCG, OS-SIFT and RIFT algorithms to prove the proposed method has better performance in registration.
3.3.1. Comparison Experiment of preprocessing algorithms
The preprocessing step can help to realize the registration of multi-source images with large radiometric differences. To demonstrate the effectiveness of algorithm applied in this paper, the pair6 are chosen for the experiment, whose similarity of the original data is too low to be registered by all the four algorithms. The
Figure 7 depicts the registration results obtained with/without the denoising and enhancement process. (a) represents the matching result of original data, which is obviously unable to be registered. The SAR and optical images in (b) and (c) are preprocessed respectively. Even if one of the images is processed, the image pairs failed to register. For the denoised SAR image and enhanced optical image, the corresponding points are matched correctly, as shown in (d). It is clear that the noise in the original SAR image and the texture in the optical image have a strong impact on registration, and more correct matching pairs are obtained after preprocessing.
In addition, comparison experiments are conducted on images preprocessed with aforementioned steps. As listed in
Table 1, our method performs the best among the four methods in terms of RMSE and running time.
It should be noted that this step is optional for remote sensing image registration. There are serious radiation and geometric distortion in data for this experiment, which bring greater challenges to registration, and the proposed steps contribute to extracting corresponding features.
3.3.2. Comparison Experiment of OLG and LG
Since OLG algorithm can reduce redundancy in feature detection, the repeatability is chosen to evaluate its effectiveness. A higher repeatability indicates a higher probability of extracting the corresponding features. The results are shown in
Figure 8.
The OLG algorithm achieves scores higher than or equal to LG on all images, demonstrating the contribution to improving the registration accuracy.
In addition, the feature detection time and registration precision of OLG are tested in this experiment, and the results are listed in
Table 2. As can be seen from
Table 2, for all image pairs, OLG can reduce the feature detection time by nearly half, and the accuracy are improved on all data.
3.3.3. Comparison Experiment of Square Descriptor and Sector Descriptor
The purpose of this experiment is to compare the performance of square descriptors and the sector descriptors used in this paper. The similarity between the descriptor vectors on paired features determines the success rate of matching. In order to illustrate that the proposed descriptors are capable to extract the common information better, we chose two metrics to measure the similarity of the paired descriptors. Cosine similarity and Pearson correlation coefficient are commonly used methods to measure the similarity between two vectors, both in the range of [-1,1]. A value of 1 means that the vectors are completely similar, and -1 indicates the opposite.
Five pairs of points are selected randomly from Pain1 for this experiment, then the descriptors are bulid in square and sector methods respectively. The similarity of vectors are shown in the
Figure 9. The bars in light color represent cosine similarity, and those in dark color represent Pearson correlation coefficient of two methods. The sector descriptor (the orange bars) outperforms than square descriptor (the blue bars) on all points selected. It can be proved that descriptors based on sector neighborhood fit the corresponding points matching better.
Besides, the comparison experiment of registration results is shown in
Figure 10. For the reliability of experiment, all parts of the method are kept consistent except for the descriptor construction step. This result demonstrates that our descriptors have a higher robustness, and thus conducive to multi-source image registration.
3.3.4. Comparative results with other methods
To analyze the performance of proposed method, the accuracy and efficiency are compared with three methods: HAPCG, OS-SIFT, and RIFT algorithm. For the fairness of the experiment, all methods use the same matching method, and parameters are set under the recommendations of their authors.
Figure 11 represents the matching results of the EC-RIFT algorithm. It was clear that the sufficient features are matched correctly, confirming the effectiveness of the proposed method for SAR and optical image registration.
Table 3 shows the quantitative results of the four methods. The EC-RIFT method is capable of steadily registering image pairs with lower error and time.
The performance of OS-SIFT is the most vulnerable. For pair2-5, the × means that the algorithm fails to register, and the
in table represents the number of correct matching is less than five, which we find unconvincing. Other methods can obtain enough matching pairs to achieve registration tasks. In terms of RMSE, the proposed method outperforms other three methods in all image pairs. For pair2, the results obtained by our method are 38.97% and 11.19% better than those obtained by HAPCG and RIFT, achieving significant accuracy improvement. The running times of HAPCG method for pair4 and pair5 are less than our method, but as shown in
Figure 12 the average time for all image pairs of our method is 7.31s, which is still the shortest of these methods. It should be noted that the OS-SIFT is annotated as
because it obtains valid results only on pair1 and takes longer than our algorithm on it.
3.4. Experiments on multi-source Images
The experimental results on SAR-optical images demonstrated the effectiveness and superiority of our method. As an extension, we focus on the registration performance of other multi-source images in this subsection. The visual results are shown in
Figure 13 . The EC-RIFT successfully matched a large number of points on these image pairs.
Table 4 lists the comparative registration results of RIFT and EC-RIFT for multi-source images. Our method obtains robust registration results and generally outperforms RIFT method in terms of RMSE and running time, implying that it has good generalization ability to be transferred to images in other modalities.