1. Introduction
Conservation decision making relies on the quantification and monitoring of forested ecosystem health across large landscapes. Spatially mapping rare, endangered, keystone, or invasive tree species is necessary to understand habitat quality, model future forest assemblages, protect a particular species, and manage ecosystem services, among other conservation goals. While the number of studies reporting tree species classifications are increasing, few classify species across large geographic extents [
1]. Large-scale species classifications have many logistical considerations that challenge accurate species mapping such as intraspecific variation [
1], canopy and understory species turnover [
2,
3], subcanopy shading [
4,
5], date of image acquisition [
4], and computational limitations [
6]. Further, traditional metrics for evaluating model performance do not capture the spatial variability of model accuracy [
7,
8] and are therefore impractical for classifications on large geographic scales. Due to the pressing need for these data in conservation applications, it is important to address these challenges associated with tree species classifications over large geographic extents.
Metrosideros polymorpha ('ōhi'a lehua) is an ideal overstory tree species to address these challenges due to its prevalence in many bioclimatic zones, high intra-specific variation, and the need for accurate spatial data of this keystone species endemic to Hawai'i (
Figure S1). Approximately half of the vegetative biomass [
9] and 80% of native forest basal area [
10] in Hawai'i is
M. polymorpha. As a major component of native Hawaiian forests,
M. polymorpha provides habitat to many endemic plants and animals [
11], is culturally important to Native Hawaiians [
12,
13], and sustains vital ecosystem services such as groundwater recharge [
14].
M. polymorpha, and therefore native Hawaiian forests, is in decline due to the introduction of invasive species and Rapid 'Ohi'a Dead (ROD), a widespread disease that has led to millions of
M. polymorpha crown mortalities [
15].
M. polymorpha not only represents a keystone species in decline, but its existence across multiple ecosystems from sea level to 7,150 m elevation and on a range of soil substrates from bare lava flows to late successional flows [
16] allows us to investigate methods of mapping a single species across diverse landscapes. While studies have compared the spectral similarity of plants across ecosystems, they largely investigate the effect of species turnover on spectral variation [
6,
17]. The
M. polymorpha model system [
18] allows us to assess the intraspecific variation of one species across bioclimatic zones. Further,
M. polymorpha is highly polymorphic, exhibiting heritable morphological and chemical differences across environmental gradients [
19,
20,
21,
22,
23]. On Hawai'i Island,
M. polymorpha has four described varieties, many of which hybridize naturally, that exist in specific habitats and have distinct morphologic, chemical, and spectral characteristics [
22,
23,
24].
Despite the prominence of
M. polymorpha on the landscape, spatial data of
M. polymorpha does not exist at the resolution necessary to inform many native forest conservation decision-making processes. For example, the Hawai'i Gap Analysis Project (HI-GAP) developed spatial data of forest classes including
M. polymorpha and mixed
M. polymorpha stands, but these data were developed in 2001 using 30 m Landsat imagery [
25]. While this map is useful to approximate locations where
M. polymorpha may exist, the resolution is too coarse for detailed spatial analyses, and it includes many classification errors due to the lack of high spatial and spectral information inherent in Landsat data. Current, high-resolution spatial information of
M. polymorpha is needed for watershed-level decision-making models being developed for Hawai'i Island [
26,
27,
28], refining ROD monitoring methods, and defining a baseline species distribution of
M. polymorpha to track future range shifts.
To develop novel and high-resolution (2 m x 2 m) spatial data of
M. polymorpha in support of these conservation efforts, we used a fusion of airborne imaging spectroscopy and light detection and ranging (LiDAR) data. Imaging spectroscopy is a process of image formation that captures reflectance across a continuous portion of the visible to short-wave infrared (VSWIR) spectrum in short wavelength intervals (~10 nm). This high-spectral resolution data captures surface chemistry [
29,
30], and because each species has a unique chemical fingerprint [
31], imaging spectroscopy allows for accurate species classifications [
2,
32,
33,
34,
35,
36,
37]. LiDAR, which uses pulsed lasers to quantify surface structure, has been fused with imaging spectroscopy data to reduce the spectral influence of canopy shading [
38,
39].
Using airborne imaging spectroscopy and LiDAR data processed over 28,000 km2 of area at 2 m spatial resolution, we developed a spatial dataset of M. polymorpha canopies across Hawai'i Island with two classification techniques, support vector machine (SVM) and spectral mixture analysis (SMA). We assessed how well these methods can classify a single highly polymorphic tree species across a large geographic area. Classification performance was assessed by test set accuracy metrics and comparison to estimated spatial probability densities calculated with gaussian process classification (GPC). The final products include an accurate, high-resolution dataset of a keystone species available to conservationists and decision-makers seeking to protect native Hawaiian forests.
2. Materials and Methods
Imaging spectroscopy data were collected across Hawai'i Island by the Global Airborne Observatory (GAO) in January 2019, with some regions filled in with the most recent data from previous campaigns in January 2016, July 2017, and January 2018 [
38]. In addition to the high-fidelity imaging spectrometer (380-2510 nm), the GAO houses a boresight aligned dual-laser LiDAR scanner. LiDAR data were used to generate a surface elevation map and precise time-synced position and orientation data that were used to orthorectify VSWIR spectroscopy data to 2 m x 2 m spatial resolution. After orthorectification, VSWIR data were corrected for atmospheric effects and processed to retrieve hemispherical-direction reflectance values with ACORN v6.0 (Atmospheric CORrection Now; AIG LLC; Boulder, CO) [
40,
41]. Cloud-free mosaics were developed by first removing clouds and cloud shadows from individual flight line level reflectance maps using a mixture of automated cloud detection provided by a trained neural network model and manual revision of the produced cloud and cloud shadow outlines. The cleaned flight line maps were then manually layered based on minimizing flight line edge artifacts as observed in red, green, blue composites. Mosaics were developed based on this layer order. VSWIR surface reflectance data across Hawai'i Island were then brightness normalized by dividing each VSWIR channel by the vector norm of the entire VSWIR spectrum for each pixel after removing bands affected by atmospheric water absorption features. Regions where differences between flight lines caused erroneous classifications were further processed using bidirectional reflectance distribution function (BRDF) adjustments. BRDF effects result from anisotropic scattering of remote sensing targets, when basic atmospheric correction model assumes a flat, evenly diffuse surface. Empirical kernel models were fit to the reflectance and observation angle data, and spectra were adjusted to a standard observation angle using the difference between the observed spectra and their modeled BRDF-spectra [
42]. Brightness normalization is a means to control for variation in reflectance caused by properties not related to foliar chemistry such as subpixel shade, leaf orientation relative to viewing and sun angles, and leaf volume [
43]. Next, the data were filtered to obtain pixels representing photosynthetic vegetation. Normalized difference vegetation index (NDVI) data were calculated using bands at 650 nm and 860 nm of the VSWIR data, and all pixels under a 0.7 NDVI threshold were removed. Understory and shaded portions of the canopy were filtered from the data using top-of-canopy height (TCH) surface maps generated from LiDAR and a shade mask generated using a ray tracing technique on the LiDAR-derived surface elevation map [
38,
44]. TCH surface maps were used to remove pixels below two meters.
2.1. Training Data Collection
In the summer of 2022 through spring 2023, 5366 canopies were delineated and identified as either
M. polymorpha or “other vegetation” (
Figure 1;
Figure S2). 1713 crowns represented
M. polymorpha canopies, and 3653 were of other species. Crowns of other species were grouped together and represent all the background vegetation spectra that the classification models discriminated from
M. polymorpha. Field data were collected using Garmin Glo GPS connected to tablets with GAO TCH, true and false color composites, and preliminary
M. polymorpha classifications developed using a support vector machine (SVM). These data were brought into the field to increase crown delineation accuracy. Data were collected using a combination of field excursions, helicopter surveys, and Google Street view. While data were collected across elevation and soil substrate age gradients, observations were primarily concentrated along roadways and other easily accessible sites (
Figure 1). In addition to canopy data collected specifically for the
M. polymorpha mapping effort, canopy delineations from Balzotti et al. [
32] and Weingarten et al. [
45] were included in the training data as they used similar methods of delineating crowns in 2019 GAO data. Canopy spectra were generated by averaging all pixels from the filtered reflectance data within each tree crown (
Figure S2).
2.2. Species Classification
We compared the performance of two classification algorithms -- support vector machine (SVM) and spectral mixture analysis (SMA) -- in distinguishing M. polymorpha from other vegetation across Hawai'i Island. For both classification algorithms, crown data were randomly separated into training (70%) and test (30%) datasets to assess model performance.
SMA assumes that an image pixel spectrum is the linear combination of the abundant materials weighted by their fractional coverage, and that this mixed signal can be unmixed using combinations of “pure” endmember spectra representing different thematic classes. This method has been successfully used for forest species classifications across many ecosystems [
33,
34], including those in Hawai'i [
36]. We used Multiple Endmember Spectral Mixture Analysis [
46] on the full VSWIR spectra with automatic band selection, which allows endmembers and the number of classes used for SMA to change on a per-pixel basis. We used a two endmember MESMA model, which assumes that each pixel is either
M. polymorpha or other vegetation plus a shade fraction [
46,
47]. Because we brightness normalized the data, no shade was used in the fitting procedure.
As computational time increases with the number of possible endmember combinations and endmembers from the same class are often spectrally similar, we pruned the spectral library by clustering training spectra and creating new endmember spectra from the mean reflectance of similar endmembers. Endmember similarity was determined using hierarchical clustering of the spectra within the training data, with a distance matrix computed using the spectral angle. To determine the optimal maximum inter-cluster distance, we tested a range of distances and used the one that achieved the highest accuracy on the test dataset. For clusters with endmembers representing both M. polymorpha and other vegetation, we averaged spectra within the cluster based on their classification to create multiple endmembers. Other parameters within MESMA such as maximum RMSE and shade threshold did not affect accuracy. The MESMA model used to classify GAO data across Hawai'i Island included a spectral library developed by reducing all the training data available into 2238 training points via the hierarchical clustering described above. While most pixels were classified as entirely M. polymorpha or other vegetation due to the high resolution of the imaging spectroscopy dataset (2 m x 2 m), we used a threshold of 0.5 to develop a binary M. polymorpha presence/absence dataset.
Like spectral unmixing, SVM classifiers are commonly used in imaging spectroscopy applications [
2,
32,
48]. SVMs efficiently handle highly dimensional datasets by maximizing the distance between training data and decision boundaries between the two categories in feature space [
2,
49,
50]. SVMs typically outperform other supervised classifiers such as random forest in imaging spectroscopy classification applications [
48,
51,
52]. We optimized hyperparameter selection (kernel coefficient and regulation parameter) of a radial kernel SVM using a grid search in the scikit-learn package (version 0.24.1) [
53] in Python (version 3.6.9). We used Youden’s J statistic, which uses the sensitivity and specificity of the model, to determine the optimal classification threshold [
54].
Throughout the model training process, we noted that traditional classification metrics such as accuracy did not sufficiently describe model performance, particularly with respect to the spatial distribution of the model predictions. To characterize the spatial variability in model performance, we compared SVM- and spectral unmixing- derived
M. polymorpha maps with an independently trained spatial probability density map estimated using gaussian process classification (GPC) in scikit-learn with a radial kernel [
53]. GPC is a probabilistic classification technique that is used to assign expected class probabilities with Bayes theorem [
55]. The GPC estimates the probability of observing
M. polymorpha as a function of the x- and y-coordinates of the habitat range, providing a benchmark to validate the spatial accuracy of the spectral models (
Figure S3). While the GPC as a predictor for
M. polymorpha presence is limited by the sampling design, it is a useful tool for validating the SVM and spectral unmixing results as it can identify areas that have high and low
M. polymorpha presence probability based on the training dataset. In regions without training data, the GPC has a middle probability (50/50) of
M. polymorpha presence. The GPC trained on the training dataset achieved a 94% accuracy on the test dataset.
SVM and spectral unmixing were then applied to reflectance data filtered to excluded shaded portions of the canopy, non-photosynthetic features, and vegetation less than 2 m. To fill gaps in the canopy resulting from the shade masking, classification data were interpolated using the inverse distance weighting technique available in the rasterio package fill module v. 1.2.10. We then compared the accuracies of the two models as well as their island-wide classifications of M. polymorpha relative to the Bayesian GPC. The best M. polymorpha classification was used to develop a high resolution (2 m x 2 m) canopy map. This high-resolution map was then down sampled into a coarse density product by aggregating the 2m x 2m binary pixels onto a 30m x 30m grid via the mean function.
5. Conclusion
Large-scale vegetation classifications have many practical uses in conservation decision-making [
63,
64,
65,
66]. For example, this island-wide dataset of the keystone species
M. polymorpha can be used to improve disease tracking models and as input for watershed-level decision-making models [
26,
27,
28]. Until recently, these decision-making processes relied on a mapping effort from 2001, and while these data were useful for approximating
M. polymorpha across Hawai'i Island, especially in regions with dense
M. polymorpha stands, it did not include up to 316 km2 of
M. polymorpha canopies, that are typically found in regions with low
M. polymorpha canopy density. To achieve routine species classifications at large spatial scales, we need to address not only the issues discussed above, but also questions of spatial resolution [70] and training data collection [
48]. The upcoming spaceborne imaging spectrometers will collect data at 30 m x 30 m resolution while the data used here were 2 m x 2 m. Further, this project required extensive fieldwork to collect over 5,000 training points, but designing better methods of planning sampling schemes across ecosystems and environmental gradients that would require less fieldwork but lead to similar results would make mapping projects like this more feasible.
Author Contributions
Conceptualization, MMS, GPA, and REM; methodology, all authors; data curation, NRV; fieldwork, MMS; code development, MMS, BS, MK; formal analysis, MMS, BS; writing—original draft preparation, MMS; writing—review and editing, all authors; funding acquisition, MMS and GPA. All authors have read and agreed to the published version of the manuscript.