The main purpose of this paper is to disclose the newly discovered solution to 3D projection in a human-like binocular vision. However, for the sake of making this paper more convincing, we present a set of preliminary experimental results in this section.
5.1. Real Experiment Validating Equation of 3D Inverse Projection of Position
Here, we would like to share an experiment in which we makes use of low-cost hardware with low-resolution binocular cameras and a small-sized checkerboard. In this way, we could appreciate the validity of Equation (20) and the theoretical result which is summarized in
Figure 6.
As shown in
Figure 8, the experimental hardware includes a Raspberry Pi single board computer, a binocular vision module, and a checkerboard. The image resolution of the binocular cameras is
pixels. The checkerboard has the size of
cm, which is divided into
squares with the size of
cm each.
Figure 8.
Experimental hardware includes Raspberry Pi single board computer with a binocular vision module and a checkerboard which serves as input of calibration data-points as well as test data-points.
Figure 8.
Experimental hardware includes Raspberry Pi single board computer with a binocular vision module and a checkerboard which serves as input of calibration data-points as well as test data-points.
Inside the checkerboard, serve as calibration data-points for the purpose of determining matrix in Equation (20), while serve as test data-points of the calibration result (i.e., to test the validity of matrix in Equation (20)).
Refer to Equation (20), matrix is a matrix in which there are nineteen independent elements or parameters. Since a single Equation (20) will impose three constraints, at least seven pairs of and are needed for us to fully determine matrix .
As shown in
Figure 9, we define a reference coordinate system as follows: Its Z axis is parallel to the ground and is pointing toward the scene. Its Y axis is perpendicular to the ground and is pointing downward. Its X axis is pointing toward the right-hand side.
Then, we place the checkerboard at four locations in front of the binocular vision system. The Z coordinates of these four locations are 1.0 m, 1.5 m, 2.0 m, and 2.5 m, respectively. The checkerboard is perpendicular to Z axis, which passes through test data-point
. Therefore, the X and Y coordinates of the calibration data-points
and the test data-points
are known in advance. The values of these X and Y coordinates are shown inside
Figure 9.
Figure 9.
Data set for calibrating matrix in Equation (20).
Figure 9.
Data set for calibrating matrix in Equation (20).
When the checkerboard is placed at one of the above-mentioned four locations, a pair of stereo images is taken. The index coordinates of the calibration data-points and the test data-points could be determined either automatically or manually.
By putting the 3D coordinates and index coordinates of the calibration data-points together, we obtain Table 1 which contains the data needed for calibrating the equation of 3D inverse projection of binocular vision (i.e., Equation (20)).
With the use of data listed in Table 1, we obtain the following result of matrix
:
Now, we use the index coordinates in Table 1, calibrated matrix
, and Equation (20) to calculate the 3D coordinates of calibration data-points
. By combining these calculated 3D coordinates with data in Table 1, we will obtain Table 2 which helps us to compare between the true values of
’s 3D coordinates and the calculated values of
’s 3D coordinates.
In Table 2, the values in columns 2, 5 and 8 are the ground truth data of (X, Y, Z) coordinates of . The values in columns 3, 6, and 9 are the computed values of ‘s (X, Y, Z) coordinates by using Equation (20).
Similarly, we use the index coordinates of the test data-points , calibrated matrix , and Equation (20) to calculate the 3D coordinates of . Then, by combining the true values of ’s 3D coordinates and the calculated values of ’s 3D coordinates together, we obtain Table 3 which helps us to appreciate the usefulness and validity of Equation (20).
In Table 3, the values in columns 2, 5 and 8 are the ground truth data of (X, Y, Z) coordinates of . The values in columns 3, 6, and 9 are the computed values of ‘s (X, Y, Z) coordinates by using Equation (20).
In view of the low-resolution of digital images (i.e.,
pixels) and a small-sized checkerboard (i.e.,
cm divided into
squares), we could say that the comparison results shown in Table 2 and Table 3 are reasonably good enough for us to experimentally validate Equation (20). In practice, images with much higher resolutions and checkerboards of larger sizes will naturally increase the accuracy of binocular vision calibration as well as the accuracy of calculated 3D coordinates by using Equation (20).
5.2. Comparative Study with Textbook Solution of Computing 3D Coordinates
For the sake of convincing the readers about the accuracy of 3D coordinates computed from using newly discovered Equation (20), we include the 3D coordinates computed from using Equation (16) which is the conventional method taught in textbooks of computer vision or robot vision.
The use of Equation (16) requires us to first calibrate the forward projection matrices of both left and right cameras. These two forward projection matrices are described in Equations (11) and (12). With the same dataset in Table 1, we obtain the following two forward projection matrices for both left and right cameras:
and
From the index coordinates listed in Table 1 and the two forward projection matrices in Equations (25) and (26), the use of Equation (16) will yield the computed 3D coordinates of calibration data-points . By combining these calculated 3D coordinates with data in Table 1, we will obtain additional entries to Table 2 which help us to compare between the true values of ’s 3D coordinates and the calculated values of ’s 3D coordinates.
In Table 2, the values in columns 2, 5 and 8 are the ground truth data of (X, Y, Z) coordinates of . The values in columns 4, 7, and 10 are the computed values of ‘s (X, Y, Z) coordinates by using Equation (16).
Similarly, from the index coordinates listed in Table 1 and the two forward projection matrices in Equations (25) and (26), the use of Equation 16 will yield the 3D coordinates of . Then, by combining the true values of ’s 3D coordinates and the calculated values of ’s 3D coordinates together, we obtain additional entries to Table 3.
In Table 3, the values in columns 2, 5 and 8 are the ground truth data of (X, Y, Z) coordinates of . The values in columns 4, 7, and 10 are the computed values of ‘s (X, Y, Z) coordinates by using Equation (16).
If we compare the data among columns 3, 4, 6, 7, 9 and 10 in Table 2, it is clear to us that the accuracy obtained from the newly discovered solution (ie., Equation (20)) in this paper is much better than the accuracy obtained from the conventional solution (i.e., Equation (16)). Especially, the errors in Z coordinates are largely reduced with the use of the proposed new solution.
Similarly, if we compute the data among columns 3, 4, 6, 7, 9 and 10 in Table 3, the same conclusion could be made, which is to say that the proposed solution will produce 3D coordinates of higher accuracy than the conventional solution in textbooks of computer vision or robot vision.
On top of the performance of achieving better accuracy than the textbook solution, the newly discovered solution simply requires one multiplication between a matrix and a vector. This is shown in Equation (20).
However, if we examine Equation (16), it is clear to us that the conventional way of computing the 3D coordinates at each point or pixel requires one transpose of matrix, one inverse of matrix, and three times of matrix multiplication. Hence, the newly discovered solution minimizes the computational workload for each set of 3D coordinates. This helps us to understand why human eyes do not experience fatigue or heating despite the huge quantity of visual signals coming from each eye’s imaging cells.