2.3.1. In Situ Data Acquisition
In situ Chl-
a measurements (μgL⁻¹) were provided by the Manitoba Sustainable Development, Water Quality Management Section in 2019. Chl-
a data were collected from 32 stations distributed throughout the lake (14 stations in the NB, 15 in the SB, and three in the Narrows;
Figure 1) during the peak phytoplankton biomass season (July to October) between 2002 and 2018 [
19]. All Chl-
a data were measured using spectrophotometric methods (American Public Health Association (APHA) 10200H [
23]). Since satellite-derived Chl-
a information is limited to the uppermost water layer, we only used samples taken from the surface of the lake (≤ 0.1 m depth). Although using Chl-
a samples from deeper layers (≤ 0.5 m depth) could have substantially increased the number of
in situ and satellite matchup points, it reduced the performance of Chl-
a models (
Figure S1). Our criteria for
in situ Chl-
a sample selection resulted in no samples from the Narrows, as all samples there were from deeper than 0.1 m. Due to the log-normal distribution of Chl-
a in nature [
24], all Chl-
a data were logarithmically transformed (using natural log) prior to analysis.
2.3.3. Pre-Processing Landsat Level 1 Data
Partial atmospheric correction was applied to Landsat Level 1 data (
Figure 2) because Level 2 products may introduce higher uncertainty in aerosol calculations over inland waters, leading to erroneous results [
25,
26]. All pre-processing steps for transforming Landsat data to partially corrected satellite top-of-atmosphere (TOA) reflectance were performed in GEE (
Figure 2, Supplementary Material). Land pixels and those covered by cloud or cloud shadow during satellite image acquisition were masked out using a standard LW boundary shapefile and Pixel Quality Assessment bands provided with Landsat Level 2 products. The individual pixels were then pre-processed in three steps.
Step 1. Radiometric correction
Landsat Level-1 data are stored as digital numbers (DNs). During the radiometric correction, we recalibrated the DNs to the satellite TOA radiance (
Lλ) using the standard equation [
27] (1):
Where:
Lλ is TOA radiance for band λ,
DNλ is DN for band λ,
Gλ is the multiplicative rescaling factor for band λ, and
Bλ is the additive rescaling factor for band λ.
Step 2. Partial atmospheric correction:
Lλ values were corrected for Rayleigh scattering using a partial atmospheric correction algorithm [
28]. We used an inverse algorithm based on a simplified radiative transfer model to calculate Rayleigh scattering radiance (or Rayleigh path radiance) for each band (
Lr(λ)) [
28] (2):
Where:
ESUN is the mean solar exo-atmospheric irradiance,
Pr is the Rayleigh scattering phase function,
θs is the solar zenith angle (degrees),
θ is the satellite viewing angle (degrees) (set to 0°),
τr is the Rayleigh optical thickness, and
toz↑ and toz↓ are upward and downward ozone transmittance, respectively.
The phase function
Pr was calculated using (3):
Where:
Θ (the view zenith angle) is the scattering angle (180° − θs),
δ is the depolarization factor [
30], which denotes the polarization of anisotropic particles at right angles and is dependent on the wavelength, atmospheric pressure, and air mass (with the last two variables being constant values) [
31].
The Rayleigh optical thickness,
τr, was calculated using [
32,
33] (4):
Where:
λ is the band specific mid-wavelength value (nm).
toz↑ and
toz↓ were calculated using [
34] (5,6):
Where:
toz is the constant ozone optical thickness [
35].
Finally, the Rayleigh-corrected radiance;
was obtained by subtracting the Rayleigh scattering radiance (
Lr) from the TOA radiance (
Lλ) (7):
Step 3. Converting radiance to reflectance:
TOA reflectance for each band (
), was calculated using (8):
Where:
ρλ is TOA reflectance for wavelength range or band λ, and
d is the Earth-to-sun distance in astronomical units [
27].
2.3.4. Extracting Matchups
We extracted remote sensing reflectance (R
rs) for each band (B, G, R, and NIR;
Section 2.3.3) and paired it with the corresponding sampling points (
Section 2.3.1) in GEE (See Supplementary Material). To increase the chance of matching satellite observations with sampled Chl-
a, we considered Landsat images within ±3 days of the sampling dates. Larger temporal windows were not used due to the highly dynamic nature of blooms in LW [
36]. Satellite data were extracted from corresponding pixels to sampling point locations (single pixels; spatial windows of 1 × 1). To mitigate potential errors in digitization [
37], we also extracted R
rs using the median of pixels corresponding to sampling locations and their eight (spatial window of 3 × 3) and 24 neighboring pixels (spatial window of 5 × 5). Based on these criteria, a total of 42 samples (18 in the NB and 24 in the SB) from 28 stations (13 in the NB and 15 in the SB) were identified (
Figure 1,
Table 1). We found 9 Landsat images for the NB, 11 for the SB, and 20 for the entire LW matching the time and location of the sampling points, providing 23 matchup points in the NB, 30 in the SB, and 57 for the entire LW (
Table 1). We created 36 sets of matchups with different temporal windows (0 to ±3 days), different spatial windows (1 × 1, 3 × 3, and 5 × 5), and in each LW basin (NB-specific and SB-specific) as well as for the entire lake (LW-specific).
2.3.5. Model Calibration
For each set of matchups (
Section 2.3.4), we tested the performance of 44 models, including all possible bands and band ratios, as well as various commonly used multiband combinations (Supplementary Material,
Table S1) to predict Chl-
a in LW using regression analyses. Each model was fitted using the matchup natural log-transformed
in situ Chl-
a values against the partially corrected R
rs (with and without natural log-transformation) using simple linear regression equations (
Table S1). We then identified the Best Performing Model (BPM) for each set of matchups as the model that exhibited the highest coefficient of determination (R²) and was statistically significant (p < 0.05). Model calibration on the BPMs was performed using 5-fold cross-validation, chosen to balance bias and variance given the number of matchups available [
38,
39]. During model calibration, for each fold of the 5-fold cross-validation, the models were fitted to the training dataset (natural log-transformed
in situ Chl-
a values against partially corrected R
rs) using simple linear regression equations. We then estimated the average predictive performance of each model during the calibration stage using the average R² of the five cross-validation folds.
2.3.6. Model Validation
To assess the predictive capacity of the BPMs identified during the initial calibration phase for each set of matchups (
Section 2.3.5), we validated each model by applying the slope and intercept from each calibration to its associated test matchup set in the cross-validation fold (k = 5). The predictive performance of the BPMs was measured the predictive performance of the BPMs using Root Mean Square Error (RMSE; µgL
-1) for each cross-validation fold as follows (9):
Where:
represents the observed Chl-a values, and
represents the predicted Chl-a values.
We normalized the RMSE (NRMSE) for each cross-validation fold as follows (10):
Where:
is average observed Chl-a.
We calculated the Root Mean Squared Logarithmic Error (RMSLE; µgL
-1) as follows (11):
We also calculated the Mean Absolute Error (MAE; µgL
-1) and Mean Absolute Percentage Error (MAPE; %) for the five cross-validation folds to measure the predictive performance of the models as follows (12,13):
Bias was measured as follows (14):
Finally, we compared the performances of the models based on the average errors for all the cross-validation folds [
40].