Appendix A. Forecast Performance Evaluation Metrics
Streamflow data from 1990 to 2016 is used for climatological streamflow calculation. Climatological streamflow value is calculated on daily basis using the data from a 29 day window. For a given day, the climatology value is the distribution of data within the period from 2 weeks before to 2 weeks after the target day over the climatology period. Metrics were computed for each lead time – Day 1 to Day 7.
DETERMINISTIC FORECAST
PBias: This metrics estimates whether the model is consistently underestimating or overestimating streamflow. It can be positive (underestimation) or negative (overestimation) and was calculated for each lead time (in days) as:
In the above equation was observed streamflow, was modelled streamflow, LT was lead-time in days, and n was total number of observations.
Pearson's Correlation Coefficient (PCC): The PCC measures the linear correlation between observed and simulated time series, in our case the rainfall and streamflow respectively. It is calculated as:
The value of PCC is bounded between –1 and 1 and measures the strength and direction of the relationship. When one variable changes, the other variable changes in the same direction.
Mean Absolute Error (MAE): Mean Absolute Error (MAE) is the average of the magnitude of the errors. The perfect score is zero, and was calculated by:
Nash-Sutcliffe Efficiency (NSE): The Nash-Sutcliffe efficiency [
80] quantifies the relative magnitude of residual variance compared to the observed streamflow variance, by:
In the above equation was the mean observed streamflow. In this study, NSE is used to assess the performance of the model forecast for each of the lead time.
Kling-Gupta Efficiency (KGE): The KGE [
81] performance metric is widely used in environmental and hydrologic forecasting and is defined as:
In the above equation is Pearson Correlation Coefficient (Equation A2), is a term that represents the variability of the forecast errors and is defined by the ratio of the standard deviation of the observed and predicted data and is the ratio of the mean of the observed and simulated data respectively.
Root Mean Squire Error (RMSE): The RMSE measures the average difference between the predicted values and observed ones. It provides an estimate of how accurately the model can predict the target time series.
ENSEMBLE FORECAST
CRPS: This metrics allows for quantitative comparison between deterministic and ensemble forecasts. It is calculated as the difference between cumulative distribution of forecast with the corresponding observation [
82]. The CRPS reduces to the mean absolute error for (MAE, Equation A2) deterministic forecasts and given by:
In the above equation,
is the forecast cumulative distribution (CDF) probability for the
tth forecast
is the observed CDF probability (Heaviside function). For ensemble rainfall, the relative CRPS, as a function of catchment mean rainfall (
) is calculate as:
CRPSS: This metric measures the relative performance of streamflow forecast and is calculated with respect to the reference forecast. It is calculated as:
In the above equation, is the reference forecast, calculated from streamflow climatology period.
PIT: The Probability Integral Transform diagram (PIT) is used assess the reliability of ensemble forecasts [
83]. It is the cumulative distribution function (CDF) of the forecasts
evaluated at observations
(rainfall or streamflow) and is given by:
PIT is uniformly distributed for reliable forecasts and falls on the 1:1 line for a perfect forecast. To avoid visual disparity, we have used quantitative score of Kolmogorov-Smirnov Goodness (KS-D) statistic to measure the deviation of PIT values from the perfect forecast. The KS-D statistic is used to compare maximum deviation of cumulative PIT distribution from uniform distribution of the forecasts. We used PIT-Alpha [
84] to compare PIT values of forecast ensemble streamflow and rainfall from all catchments:
In the above equation, is the sorted .
CATEGORICAL METRICS
The categorical metrics for the assessment of streamflow and rainfall forecast included [
85]: Probability of Detection (POD), False Alarm Ratio (FAR), and Critical Success Index (CSI). These metrics are extensively used in operational forecast assessment [
52,
53,
55].
Probability of Detection (POD): The POD is based on the correctly identified (
) and missed (
) number of forecast class. The value ranges from 0 to 1 and perfect score is 1.
False Alarm Rate (FAR): The FAR depends upon the class detected by the forecasts but not observed (
) and the correctly identified (
) ones. The value of the metric ranges from perfect score of 0 to 1.
Critical Success Index (CSI): The CSI represents the overall number for forecasts correctly produced by the model. Its value ranges from 0 to perfect score 1.
Figure A1.
Graphical representation of forecast rainfall performance metrics of a randomly selected catchment from Tasmania: (a) PBias, (b) PCC, (c) MAE, (d) NSE, (e) KGE, (f) RMSE, (g) CSI of 5th, 25th, 50th, 75th, and 95th percentiles, (h) FAR, (i) POD, (j) CRPS and (k) PIT Alpha.
Figure A1.
Graphical representation of forecast rainfall performance metrics of a randomly selected catchment from Tasmania: (a) PBias, (b) PCC, (c) MAE, (d) NSE, (e) KGE, (f) RMSE, (g) CSI of 5th, 25th, 50th, 75th, and 95th percentiles, (h) FAR, (i) POD, (j) CRPS and (k) PIT Alpha.
Figure A2.
Graphical representation of forecast streamflow performance metrics of a randomly selected catchment from New South Wales: (a) PBias, (b) PCC, (c) MAE, (d) NSE, (e) KGE, (f) RMSE, (g) CSI of 5th, 25th, 50th, 75th and 95th percentiles, (h) FAR, (i) POD, (j) CRPS and (k) CRPSS and (l) PIT Alpha.
Figure A2.
Graphical representation of forecast streamflow performance metrics of a randomly selected catchment from New South Wales: (a) PBias, (b) PCC, (c) MAE, (d) NSE, (e) KGE, (f) RMSE, (g) CSI of 5th, 25th, 50th, 75th and 95th percentiles, (h) FAR, (i) POD, (j) CRPS and (k) CRPSS and (l) PIT Alpha.