1. Introduction
The geometric model fitting, which refers to accurate estimation of model parameters in noisy data, is a fundamental issue in computer vision, for example, estimation of homography/fundamental matrix, i.e. plane detection and motion estimation. However, this task is non-trivial due to noise and outliers. The situation is more challenging if the data consist of multiple geometric structures. To illustrate the problem addressed in this paper, we take the planar homography estimation for example. As presented in
Figure 1 ("ladysymon" from AdelaideRMF [
1]), we deal with the matched points in two-view images (Image 1 and Image 2) to estimate the homography matrix. There are two "plane” structures corresponding to two homographies. And the number of model instances (homographies), ratios of inliers (correct matched points) and outliers (mismatched points) are unknown. By proposing the model fitting method, the two "plane” structures corresponding to two homographies can be estimated, and the correct matched points on each planes can be segmented as the inliers of each homography excluding the impact of the outliers. Certainly, the structures are not restricted to "homography", we also consider the fundamental matrix in this paper.
Over the last 30 years, the classical random sample consensus (RANSAC) method [
2] and the RANSAC family of methods [
3] have been widely used to handle the model fitting problem with outliers. These methods often work well for single-model fitting, but they are not appropriate for multi-model fitting (e.g., sequential RANSAC [
4,
5] and multi-RANSAC [
6]), because many data are viewed as noise/outliers, i.e., one data belonging to a specific model implies outliers to other models, which yields many pseudo-outliers [
7]. Hence, geometric multi-model fitting is still an open issue [
8].
The multi-model fitting problem can be considered as a typical example of a chicken-and-egg problem [
9]: both the data-to-model assignments and model parameters are unavailable, but given the solution to one sub-problem, the solution of the other can be easily derived. The multi-model fitting methods start with a sampling process to generate lots of hypotheses. Because there are usually multiple underlying structures and gross outliers, the minimum sample set (MSS) used to calculate the hypotheses usually contains outliers or pseudo-outliers, which makes it quite impossible to distinguish the inliers belonging to different models by using hypothesis parameters or hypothesis residuals directly.
To address this issue, the preference analysis based methods [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22] were developed, which made use of the hypothesis residuals to calculate preference set for the data points for clustering. J-linkage [
10,
11], as the earliest preference analysis based method, adopts a conceptual preference for points by binarizing the residuals with an inlier threshold, and introduces the Jaccard distance to measure the similarity between two points’ preference sets for linkage clustering. Finally, the inliers belonging to different models are segmented into different clusters. T-linkage [
12,
13], an improved version of J-linkage, uses relaxation of the binary preference function and the soft Tanimoto distance to improve the conceptual preference in J-linkage for better clustering. Robust preference analysis (RPA) [
16] represents the data points in conceptual space as in J-linkage, and then performs robust principal component analysis (PCA) and symmetric non-negative matrix factorization (NMF) to decompose the multi-model fitting problem into many single-model fitting problems, which in turn are solved by M-estimator sample consensus (MSAC) [
23]. While both conceptual preference in J-linkage and soft conceptual preference in T-linkage need inlier threshold to exclude the impact of the outliers, which make the methods not robust and quite inconvenient to use when dealing with outliers.
In order to avoid the problem brought by the inlier threshold, permutation preference makes use of the order number of the sorted residuals without introducing the inlier threshold, which is widely used in multi-model fitting methods [
9,
18,
19,
20,
21,
22,
24]. Similarly, kernel fitting (KF) [
17] makes use of the permutation by sorting the residuals of the hypotheses as the preferences to build the Mercer kernel to elicit potential points belonging to a common structure and, as a result, the outliers and noise can be removed. Meanwhile, permutation preference is also used to represent the hypotheses for mode seeking [
9,
19]. Wong [
20] made use of permutation preference for hypothesis sampling and inlier clustering. The simulated annealing based random cluster model (SA-RCM) [
21] integrates permutation preference with graph-cut optimization for the hypothesis sampling, and the multi-model fitting task is solved efficiently in a simulated annealing framework. Lai [
24] combines permutation preference analysis and information theory principles to build a discriminative sparse affinity matrix for clustering.
Because the preference analysis based methods make a great effort to take advantage of the residuals and neglect the spatial distribution of the data points, i.e. the inliers belonging to the same model are usually spatially close in image. A series of energy-based fitting methods [
21,
25,
26,
27,
28,
29,
30] have been proposed to optimize the fitting solution by accounting for the model error and encouraging spatial smoothness of the points. The energy-based fitting methods formulate the geometric multi-model fitting as an optimal labeling problem, with a global energy function balancing the geometric errors and the regularity of the inlier clusters, the optimal labeling problem is solved by means of the
-expansion algorithm [
31]. Similar graph-based methods [
32,
33] are proposed to solve this problem. And also hypergraph [
34] has been introduced to represent the relation between hypotheses and data points for multi-model fitting [
35,
36].
Although the energy-based fitting methods can make good use of the spatial prior of the data points that inliers belonging to the same model will tend to be neighbours in image, which also achieve a promising inlier segmentation result. While outliers are always randomly distributed in image throughout the data points. Therefore the energy-based fitting methods can hardly handle outliers by using the data error or spacial smoothness, and always need extra inlier threshold or scale estimator [
37,
38,
39,
40,
41,
42,
43] to exclude the impact of the outliers. However most of the scale estimation methods depend on a certain noise distribution model (usually Gaussian distribution), and in the geometric model fitting problem, the noise distribution is always extremely complicated after sampling, feature extraction and matching, which make the scale estimators perform poorly in the geometric model fitting. Thus making the energy-based fitting methods very restricted for geometric model fitting.
Most of the time, the key to improving the fitting accuracy is the outliers, whose residuals to all the true models in the data set are bigger than the inliers, which makes the consensus of the outliers. When the proportion of good hypotheses is high enough after the sampling process, the quantized residual preferences[
44,
45] of the outliers will tend to have big values, which makes the outliers gather away from the inliers in quantized residual preference space. However previously outlier detection by quantized residual preferences conducted only in preferences space, thus making the results sensitive to the sampling process. Once the percentage of correct models models from the sampling process is not high enough, then the result is poor. Considering the points distribution in quantized residual preference space, the energy minimization can be successfully used to deal with the outliers without scale estimation. In the paper we extend the energy minimization to quantized residual preference space by using the neighborhood graph constructed from Delaunay triangulation of the points in quantized residual preference space, rather than points’ image coordinate, and rearrange the data cost of the outlier label according to the pseudo-outliers. To further hit the outliers as many as possible, the energy minimization process follows the framework of alternate sampling and labeling, which involves alternately conducting energy minimization based labeling optimization and sampling the model hypotheses within the labeling inlier clusters for next round energy minimization. Thus making the sampling and labeling process mutually improved. After the outlier detection process, an inlier segmentation process based on a conventional energy minimization fitting method is used on the data points with outliers removed, which constructs the neighborhood graph from Delaunay triangulation of the points’ image coordinate. The energy-based inlier segmentation process also follows an alternate sampling and labeling framework.
The rest of this paper is organized as follows. In
Section 2, we introduce the proposed method in detail. The experiments in geometric multi-model fitting, including two-view plane segmentation and two-view motion segmentation, are presented in
Section 3. Finally, we draw our conclusions in
Section 4.
3. Results
In this section, the experiments undertaken in geometric multi-model fitting, contain two-view plane segmentation and two-view motion segmentation. The two-view plane segmentation is actually a multi-homography estimation problem, and a plane can be parametrized by homography matrix calculated from the matched points on the plane in two images. So in order to achieve two-view plane segmentation, we need to fit multiple homographies from the matched points. Every time we need at least four pairs of matched points to calculate the homography matrix using DLT method [
49], and we use Sampson distance for computing the residuals.
We first test the proposed outlier detection method on "Merton College II"
1 for two-view plane segmentation compared to kernel fitting [
17]. We have added 75 outlier corners in the data, the ground truth outliers and inliers are labelled in different shape in
Figure 6.
In the outlier detection experiment on "Merton College II", the proposed method detected all the 75 outliers with only 3 false detected outliers that belongs to the inliers. While the kernel fitting method found out 66 outliers and had 4 false detected outliers. The corresponding detected outlier results are presented in
Figure 7.
While for two-view motion segmentation, the rigid-body motion
can be described by a fundamental matrix
corresponding to two views. So the two-view motion segmentation is actually a multi-fundamental matrices estimation problem, and the rigid-body motion can be generally described by fundamental matrix. In the multi-fundamental matrix estimation, we use normalized 8-point algorithm [
50] to estimate the fundamental matrix, and calculate the residuals by Sampson distance.
The proposed outlier detection method for two-view motion segmentation is initially tested on "cars5" from the Hopkins 155 motion dataset
2, which is mainly used for testing motion segmentation algorithms to segment feature trajectories [
51]. While in this experiment, we just select two frames (the 1th frame and 21th frame) from the "cars5" video sequence and the corresponding tracking features as the ground truth inliers, and then 100 outliers have been added to test the proposed outlier detection algorithm.
In the outlier detection experiment on "cars5" for two-view motion segmentation, the proposed method successfully detected 90 outliers with only 10 missing outliers, without false detection. While the kernel fitting method found out only 59 outliers and had 35 false detected outliers. The proposed method shows much superiority to kernel fitting, which can detected the most outliers with few false detection. The corresponding detected outlier results are presented in
Figure 9.
Further more, the proposed outlier detection method has been fully tested on the AdelaideRMF data set
3 to show the performance of outlier detection and the corresponding overall inlier classification for both of the two-view plane segmentation and two-view motion segmentation. Comparisons in inlier segmentation with the state-of-the-art methods of "propose expand and re-estimate labels" (PEARL) [
25], SA-RCM [
21], J-linkage [
10], T-linkage [
12], Prog-X [
30] and CLSA [
52] were undertaken. The overall misclassification percentage (number of misclassified points divided by the number of points in the data set) [
53] is used to represent the model fitting performance.
The outlier detection results are shown in
Figure 10, compared to the results of kernel fitting [
17], for both two-view plane segmentation and two-view motion segmentation.
Figure 10(a)–10(f) are the two-view plane segmentation data and
Figure 10(g)–10(l) are the two-view motion segmentation data. We can see that both the kernel fitting method and the proposed method can find most of the outliers in the data. However, most of the time, the detected outliers by kernel fitting contain more inliers, and more outliers are undetected than with the proposed method. The proposed method can usually identify almost all the outliers, with very few undetected outliers, and the false detection is nonexistent, except for "dinobooks". In the "dinobooks" data, a number of model inliers are misclassified into outliers, because the even division of the data points during the region-based random sampling process makes the MSS contain some outliers, which in turn makes the inliers close to the outliers in quantized residual preference space, and thus makes them difficult to separate from the outliers. And adequate sampling of inliers of each real model can further improve the performance of the proposed algorithm, which will be the focus of our next work.
Table 1 shows the number of correctly detected outliers ("Correct"), undetected outliers ("Missing"), and falsely detected outliers ("False") for kernel fitting and the proposed method. This quantitative comparison indicates that the proposed method can generally detect almost all the outliers in the data, with fewer falsely detected and undetected outliers than kernel fitting.
Table 2 shows the misclassification results of the two-view plane segmentation, compared to PEARL [
25], J-linkage [
10], T-linkage [
12], SA-RCM [
21], Prog-X [
30] and CLSA [
52]. Please note, the misclassification for CLSA referred to [
52] only retained two decimal number for the percentage, so the misclassifications for "neem", "oldclassicswing" and "sene" are at the same level with proposed method. It can be seen that the proposed method obtains the lowest level of misclassification on most of the data sets. The corresponding inlier segmentation results are presented in
Figure 11, where most of the inliers on the different planes can be segmented quite accurately. For "neem", "ladysymon" and "sene", . For the "johnsonb" data, although there are seven planes in the data, and the inliers of model 3 (points labeled with blue) occupy a large proportion, while the inliers of model 1 (red points) and model 6 (magenta points) occupy a smaller proportion, which results in uneven sampling, few hypotheses are generated from the inliers in models 1 and 6, making these two models difficult to extract. However, the proposed method can separate models with both a large inlier scale and a small scale quite well, and obtains a much lower misclassification level than some of the state-of-the-art methods. Our method performs better in the case with more instances for "johnsona" and "johnsonb" data set.
The misclassification results for two-view motion segmentation are presented in
Table 3, compared to the other six methods same with two-view plane segmentation experiment. The proposed method obtains the lowest misclassification level on most of the data sets, except for the "breadtoycar" and "dinobooks" data, and even obtains a zero misclassification level with two of the data sets. These six data sets selected from .The corresponding inlier segmentation results are presented in
Figure 12, from which we can see that most of the inliers for the different fundamental matrix models can be segmented quite accurately, except for the "dinobooks" data, where many inliers for model 1 (red points) are classified into outliers (
Figure 10(l)), and the inlier segmentation result is poor. For the "dinobooks" data, the proportion of outliers (43%) is very high and, every time, eight points need to be randomly selected to generate a fundamental matrix hypothesis. The MSS will have a great possibility of containing outliers by means of random sampling within evenly divided sub-regions by the Euclidean distance of the data points. This will make the proportion of good hypotheses very low, and will seriously impact the performance of the quantized residual preferences, thus resulting in a poor performance for the outlier detection. Since the outlier detection result directly affects the final inlier segmentation accuracy, and improving the sampling method will help to solve the problem. Therefor we will consider to introduce preference analysis into the improvement in further study.
In the experiments, during the region-based random sampling the size of sub-region is set 20, which means 20 nearest neighbouring points make up a sub-region, and every time we randomly sample 200 hypotheses in a sub-region. For outlier detection, the common residuals of good hypotheses need to be quantized to close values, then the quantization level should be small; while the big residuals most likely belonging to the outliers need to be extremely highlighted from the inliers’ residuals, the quantization length usually set to 1. Most of the time, the parameters of quantization level and quantization length are much related to the types of model. In two-view plane segmentation (multi-homography estimation), quantization level and quantization length will get quite good results for all the data; while in two-view plane segmentation (multi-fundamental matrix estimation), quantization level and quantization length are suitable.