4.1. Support Vector Machine Classifier
In the first experiment with SVMs, the model was trained with unbalanced samples from 256 image patches of size 120x120 pixels and 10 bands. The model was trained with C=2.0, the RBF kernel, gamma=’scale’, unlimited number of iterations, and decision_function_shape=’ovr’. The achieved validation accuracy was 79.3%. Since the dataset is quite unbalanced, a reasonable high accuracy is achieved by a model that is tuned to classify correctly the 3 most frequent classes (1, 2, 5) and misclassifying the least frequent ones (0, 3). The next step was to balance the dataset, considering the same number of samples for all the classes. The considered number of samples was defined as the minimum value of the occurrences among the 5 classes.
Another direction that was explored was applying Principal Component Analysis (PCA) to reduce the number of features per sample from 10 (bands) to 3 (principal components), those that explain around 99% of the variance.
Figure 3 shows the result of plotting the samples, after being projected on a 2D/3D space, defined by the two/three principal components of PCA that explain most of the variance. The projection on 3D makes it easier to visualize the clustering of the samples belonging to the same class. The visual analysis of this figure reveals that the classes exhibit a significant overlapping on the 3D space, which will make separation difficult. It was applied grid search cross-validation (CV) to find best values for the hyperparameters
C and
of the SVM model. It was found that
and
allow the best accuracy. When using a
it was observed that the computation time, necessary to run a "batch" with a combination of hyperparameters, became extremely high. The global test accuracy of the model was 59.1%. Finally, we dropped PCA and keep the 10 original features per pixel. Considering 2048 image patches, 10 features per pixel (corresponding to 10 Sentinel-2 bands) which allowed us to achieve the highest accuracy with SVM, 68.6%.
Evaluation metrics for the best SVM model are presented in
Table 1 and the confusion matrix is in
Figure 4. Considering the F1-score, the worst result belongs to the wetlands class (class 3). Although the improvement of the SVM model after balancing the dataset, optimizing the hyperparameters and reducing the number of features with PCA is not a satisfactory result and reveals that SVM is not the best fit to classify the land cover. Moreover, even a moderated number of image patches, such as 1024, turns the training very slow.
4.2. Random Forest Classifier
Training of the RF was done with 1024 image patches of 120x120 pixels each, the number of land cover classes was 5, classes were balanced by considering a number of pixels per class equal to the least frequent class, PCA was applied to select the 3 features that explain most of the variance, the criterion used to evaluate the splits was log_loss, the RF included 100 decision trees, bootstrap=False meaning the whole dataset is applied to train each tree. The test accuracy score achieved by the trained model is .
Next, the number of image patches was increased to 2048, the number of features per pixel remained on 3, the evaluation criterion was changed to gini, the number of decision trees was kept on 100, bootstrap=True and max_samples= 0.8. The test accuracy score achieved by the trained model is . It was also tried increasing the number of decision trees to 200, but there was no improvement on the model performance.
Since using only 3 features per pixels resulted in poor results, it was decided to remove PCA and keep the 10 original features per pixel. Considering 2048 image patches, 10 features per pixel (corresponding to 10 Sentinel-2 bands), 5 land cover classes, balancing the frequency of the classes, with the
gini evaluation criterion,
bootstrap=True,
max_samples= 0.8, and
max_features=3, the test accuracy score achieved by the trained model raised to
. The confusion matrix is presented in
Figure 5. This confusion matrix reveals that the percentage of samples correctly classified is 68.0% for class 0, 67.0% for class 1, 72.0% for class 2, 55.0% for class 3, and 93.0% for class 4. Precision, recall, and F-score metrics for the trained RF model are shown on
Table 2.
4.3. U-Net
U-Net model was described with the TensorFlow library, especially the Keras API, it was trained with Adam optimizer, the categorical cross-entropy loss, the ModelCheckpoint, EarlyStopping, and ReduceLROnPlateau clallbacks, during 200 epochs. Models were evaluated based on accuracy, precision, recall, and F1-score metrics.
A list of all experiments carried out, as well as the results obtained, can be seen in the
Table 3. The set of experiments accomplished with U-Net and the BigEarthNet dataset will be summarized now.
The experiment with all 43 CLC level 3 classes will work as our baseline, i.e, with all the other experiments we will try to improve the results of the baseline. The overall accuracy achieved was with most misclassifications being within very similar classes, such as continuous urban fabric and discontinuous urban fabric. The class with the lowest results was green urban areas, being misclassified as urban fabric or forests.
The second experiment tried to improve the results of the previous attempt through the insertion of the Normalized Difference Vegetation Index (NDVI). NDVI was chosen because of its popularity in the literature, for example in [
20,
21]. The final results were worse than in the previous scenario, analysing each class individually shows that some classes were being completely misclassified and this did not happen in the previous experiment. Taking into consideration these results the idea of using other spectral indexes was abandoned.
The next step taken to improve the results was to reduce the number of land cover classes. The experiment with 15 CLC level 2 classes improved the overall accuracy to
. The normalized confusion matrix for the segmentation in 15 classes with U-Net is shown in
Figure 6. While the baseline presented some values for the F1-score metric of the order of 0.4, this model presents 0.65 as the lowest value.
The automatic classification of land cover in 15 classes is still a very ambitious objective, and therefore another model was trained to classify the land cover only in the 5 CLC level 1 classes. The trained U-Net model achieved a
overall accuracy, the best result among all experiments. Evaluation metrics for this model are presented in
Table 4 an the confusion matrix is in
Figure 7. Considering the F1-score, the worst result belongs to the wetlands class (class 3).
Two attempts with a combination of CLC classes from levels 1 and 2 were realized. The first one used 11 classes and obtained an overall accuracy of , a result worse than the experiment with 15 classes.
The second attempt used 8 classes and its overall accuracy was
, a result very similar to the experiment with 5 classes (
Table 5). The chosen level 2 classes are those that were best classified by U-Net trained with the level 2 classes. The remaining level 2 classes were collapsed into the corresponding level 1 classes. The CLC hierarchy was maintained, i.e, only level 2 classes that would be part of the same level 1 class were gathered. Confusion matrix analysis (
Figure 8) shows that 11% of the samples belonging to class 0 (artificial surfaces) are classified as class 1 (agricultural areas), 12% of class 2 (pastures) is classified as class 1 (agricultural areas), 17% of class 4 (inland wetlands) is classified as class 3 (forest and semi-natural areas), and 18% of class 5 (maritime wetlands) is classified as class 7 (maritime waters).
Experiments with the LandCoverPT dataset, the U-Net model, and level 1 land cover classes, were also accomplished. The results of these experiments were worse than those achieved with the BigEarthNet dataset, quantified as an overall accuracy of . Classes with the worst results in this experiment were artificial surfaces and wetlands. A possible explanation for this results can be the low number of samples containing those classes in the LandCoverPT dataset.
The visual inspection to the predictions with the trained models, and to the correspondent ground-truth, revealed that the classification errors were predominantly located at the boundary of the patches (
Figure 9). The most likely explanation for this fact is the existence of mixed pixels. Another explanation, mentioned in the literature, is the lower ability of U-Net to correctly segment pixels at the object’s boundaries.
Figure 9 shows a satellite image patch, randomly chosen from the test set. The leftmost column of the figure shows the ground truth masks for the 5 level 1 classes (0 to 4). The next column shows the model prediction for the same classes. In the upper right corner are presented 4 of the 10 bands of the input patch, in this case the ones with the best spatial resolution: red, green, blue and near infrared. Pixels that were misclassified are shown in yellow in the central right part of the figure.
When we evaluate the trained models with a dataset distinct from the train/validation set, the results are inferior. It was observed that several land cover parcels, classified as agricultural areas in the 2018 CLC map (yellow regions in
Figure 10), are misclassified by our models as artificial surfaces (red regions in
Figure 10) or forests and semi-natural areas (green regions in
Figure 10).
Another problem, observed in some parts of the automatically generated map, is the discontinuity between patches. This problem occurs because the masks generated by the model are obtained patch by patch, where the patch size is
. A possible solution is to discard the pixels on the periphery of the patches and use only the inner part (
Figure 11). The innermost pixels have more contextual information and better accuracy than peripheric pixels, as it can be seen in
Table 6. The drawback of this solution is the longer time is takes to generate the land cover map. For example, considering only a inner part of
pixels on each patch, the time to classify the same land area will increase
times.
Figure 12 contains a complete and continuous land cover map for continental Portugal. This map was generated with the U-Net model, trained on the BigEarthNet dataset and 5 classes. Sentinel-2 products, downloaded from the
https://scihub.copernicus.eu website, were used to generate the full map. Images were captured by the satellite on 7 July 2021 and 22 August 2021, and have a maximum cloud percentage of 5%. Because products with a minimum cloud percentage were needed, it was impossible to use all the images from the same day. To visualize the map we used the QGIS tool, where the various parcels of the map generated by the model were merged and trimmed with the help of a shapefile that defines the boundaries of the Portuguese mainland.