2.2.2. Georeferencing
The georeferencing algorithm is largely based on the aerial photos stitching solution proposed by Luna Yue Huang [
13]. Modifications in our method consists of direct georeferencing to Universal Transverse Mercator (UTM) coordinate system.
The first part of the georeferencing procedure is initial georeferencing of images based on EXIF metadata. These contain information about the geographic coordinates, yaw, and altitude of the drone camera at the time the image was taken. This information, along with the angle of field of view taken from the camera specifications, allows estimation of an affine transformation parameters of translation (
,
), scale (
,
), and counter clockwise rotation (
) that allows to embed the image in an UTM coordinate system. This is not an accurate estimation, due to the low accuracy of the UAV's GPS geolocation and high sensitivity to the shift of the camera viewing area under the influence of wind blows. In this estimation the scale
and
parameters are equal, and the shear (
,
) coefficients are 0. The transformation parameters allow us to obtain the transformation matrix
based on Equation 1.
Based on the initial georeferencing, pairs of overlapping images are found. To ensure that all possible pairs are found even when errors in the initial georeferencing may indicate a lack of overlapping, footprints are expanded (buffered) by a dilation operation by the given padding value. Each pair of images is aligned and a relative transformation matrix
is found by a well-established in the field of image vision stitching method: i) finding corresponding points (keypoints) in both images using scale-invariant feature transform (SIFT) [
14], ii) matching keypoints using fast approximate nearest neighbour search (FLANN) [
15], and iii) estimation of best transformation matrix using random sample consensus (RANSAC) [
16]. Relative transformations are verified in two stages: i) since we assume that the images are taken from the same altitude, the scaling factor in the relative transformation of two overlapping images must be approximately equal to 1. If the value of the scaling factor for relative transformations is outside the range 0.9 – 1.1, the relative transformation is considered incorrect. ii) as all images are taken in nadir orientation, the relative transformations should not perform a shearing operation. When there is no shear, the absolute values of the relative transformation matrix
satisfy the following condition:
and
. If deviation between these values is greater than 0.1, the relative transformation is considered to be incorrect. If the relative transformation is found to be incorrect, according to Luna Yue Huang solution, an attempt is made to establish a relative transformation using the same pair of images, but resized to twice lower resolution. Such a procedure allows you to obtain a different keypoints using for scale determination. If the relative transformation obtained in this way again fails to pass verification, the pair of photos is discarded.
Based on the obtained pairs, an undirected graph was constructed [
17], where each node is an aerial photo, and each edge is a valid relative transformation between a pair of photos. The connected components are then extracted from the graph. A connected component in graph theory is a group of vertices in a graph that are all directly or indirectly connected to each other. Only images (nodes) from a connected component containing the largest number of nodes are used for further processing.
Iterating over each pair of images, the coordinates of the corners of the second image are calculated in the pixel coordinate system of the first image using equation 2. This idea is also visualised in
Figure 1.
– relative transformation matrix between 2nd image and 1st image pixel coordinate system,
– coordinates of corner point expressed in 2nd image pixel coordinate system,
– coordinates of the same point expressed in 1st image pixel coordinate system.
Optimisation of georeferencing of all images involves tuning the absolute geographic transformation parameters (
,
,
,
,
) to make the relative transformations recovered from them as close as possible to the relative transformations obtained earlier using alignment of each pair separately. Recovery of the relative transformation matrix
from the absolute geographic transformation matrices
and
of two georeferenced images is obtained according to Equation 3.
and
shear parameters are not tuned during the optimization. Although tuning of these parameters further improves the matching between pairs of photos, it also introduces an unacceptable shearing of the entire mosaic of photos.
Similarly to equation 2, the coordinates of points expressed in the pixel reference system of the 2nd photo can be converted to coordinates expressed in the reference system of the 1st image using equation 4.
where:
– relative transformation matrix between 2nd image and 1st image pixel coordinate system,
– coordinates of point expressed in 2nd image pixel coordinate system,
– coordinates of the same point expressed in 1st image pixel coordinate system.
Optimisation is performed using the gradient descent method with the loss function
that consist of two components
and
. Component
is the mean Euclidean distance between the points at the corners of the images obtained from the relative transformations recovered from the tuned absolute geographic transformations and the points from the corners of the images from the relative transformations obtained during the pair alignment.
where:
– number of pairs,
– point of j-th corner of the i-th pair 2nd image expressed in the 1st image pixel coordinate system estimated from pairs alignment,
– point of j-th corner of the i-th pair 2nd image expressed in the 1st image pixel coordinate system estimated using relative transformation recovered from absolute geographic transformations tuned during optimization.
Component
is a mean Euclidean distance between geographic centroids of tuned image footprints and geographic centroids of image footprints obtained during initial georeferencing using EXIF data.
where:
– number of images,
– point of centroid of i-th image obtained from EXIF data expressed in geographic coordinate system,
– point of centroid of i-th image obtained from absolute geographic transformation tuned during optimization.
By minimizing the
factor, the optimized absolute geographic transformations are adjusted so that the retrieved from them relative transformations between pairs of images are as close as possible to the relative transformations previously obtained separately for each pair in the alignment process. Minimizing the
factor ensures that the entire mosaic will not move in any direction during the optimization. Equation 7 is the final formula for the
loss function. To minimise the impact on the relative matching of images using
component, the
component is multiplied by a factor of
. The factor value was selected experimentally by qualitative assessment of results tested for different values.
2.2.3. Thermal calibration
In order to minimise the temperature bias between overlapping images using Gradient Descent, a consistent dataset has to be prepared. If the reduction of the vignette effect has not yielded sufficient results and the photos overlap is large enough, the areas of the photos near the edges where the vignette effect affects the temperature images the most can be truncated (
Figure 2).
For the two images from each pair, only the common part is retained, as well as a mask that allows us to reproduce the irregular shape of the cutout from the rectangular array. Temperature cutouts of both images and mask are resized to array of
pixels. During resizing, the temperatures are interpolated using the bilinear method, and the mask is interpolated using the nearest neighbours method. The example result of the common clip is shown at
Figure 3. The example corresponds to the images shown in
Figure 1 and
Figure 2. The rotation is due to the transformation to an absolute geographic datum. The lack of the preservation of aspect ratio is due to the need for scaling to fill the whole area of square array of
.
By resizing the slices to pixels, it was possible to create a coherent dataset of three arrays (masks and pair of images common part) of size ( is number of pairs), which could then be used in optimization using the gradient method.
Optimisation of the temperature values involves tuning the
parameter that is used for the offset of each image according to equation 8.
where:
– temperature value with applied calibration,
– temperature value before calibration,
Optimisation uses a loss function
that consists of two components:
and
. Component
is pixelwise mean squared error between calibrated temperature values of first and second cutout images from given pair. It is calculated with Equation 8.
where:
– number of pairs,
– value of calibrated first image in i-th pair at pixel coordinate ,
– value of calibrated first image in i-th pair at pixel coordinate ,
is absolute difference between mean of all uncalibrated images and mean of calibrated images. It is calculated with Equation 9.
where:
– mean value of uncalibrated images,
– mean value of calibrated images.
By minimizing the
component, the bias between the temperature values in the overlapping images is minimized. The
component ensure that during optimization the whole mosaic will not drift from initial mean value. The total loss is calculated using equation 11. To minimise the impact on bias correction using the
component, the
component was multiplied by factor of
. The factor value was selected experimentally by qualitative assessment of results tested for different values.