2.2.4. Hybrid Model PE_Trend Based on Dynamic Prediction Effectiveness
2.2.4.1. Modeling Mechanism of Combination Prediction Model Based on Prediction Effectiveness
The combination forecasting model based on predictive validity utilizes the practical information provided by various forecasting methods, calculated in an appropriate weighted average form to obtain the combined model. In order to improve the accuracy of the combined forecasting model, this study focuses on two aspects: firstly, the deter-mination of the weighted average coefficients for each individual forecasting model in the combined forecasting model, and secondly, the examination of whether the com-bined forecasting results are superior to those of the individual forecasting models. The Predictive Validity-based Combined Forecasting Model (PE) selected in this study uti-lizes the mean of predictive accuracy and the mean square deviation reflecting the de-gree of dispersion to construct an optimal calculation model for weighted coefficients through linear programming (
Figure 5). It overcomes the influence of the deviation in forecasting results due to the different dimensions of the indicator sequences in the or-dinary combined forecasting model, thereby improving the forecasting accuracy. Moreover, the model is intuitive, computationally concise, and possesses practical ap-plication value.
The measured runoff sequence is {xt, t=1,2,3…N}, and there are m individual prediction models predicting it. xit is the water quality prediction value of the i-th single-item prediction method at time t, where i ranges from 1 to N. eit is the relative error between the measured runoff and the predicted runoff by the i-th prediction method at time t, i.e., eit=(xi-xit)/xi. Let Ait=e-|eit|, and 0≤eit≤1, then Ait is the prediction accuracy of the i-th runoff prediction model at time t, obviously 0≤Ait≤1.
Let t be the combined prediction value of xt, then we have:
ln is the weighted coefficient of a single prediction model and has:
If At and et are the combined prediction accuracy and relative error at time t, then there are:
The effectiveness of the i-th prediction model is:
Where Qit represents the weight coefficient of the accuracy Ait of the i-th prediction method at time t on the sample interval and has:
Because prior information is not clear, Q is taken as 1/N. So there are:
The prediction validity sequence values for each individual model. The prediction accuracy sequence of the combined prediction model satisfies:
Variance σ The solution formula for σ2(A) is:
Among ρij is the correlation coefficient between the prediction accuracy of the i-th runoff prediction model and the prediction accuracy of the j-th runoff prediction model, namely:
E(A) represents the average prediction accuracy of the combination prediction method at different times, with larger values leading to higher accuracy: σ (A) The prediction accuracy sequence of the combination prediction model is unstable, and the smaller the numerical value, the better the model.
The effectiveness index of the combination prediction model is defined as:
In the equation, S represents the fitting between the predicted model and the measured runoff. The larger S, the better the linear fitting between the two, and the higher the effectiveness of the predicted results.
The combination prediction model based on prediction effectiveness is:
In this system of equations, as long as the predicted values xit of m prediction methods at different times are calculated, the definition equation can be used to calculate (E)Ai, (σ)Ai, (σ)2Ai and then to calculate ρij. Using the Lagrange multiplier method, the calculated values of (E)Ai, (σ)Ai, (σ)2Ai, ρij are used to calculate the weighting coefficients l1, l2,... for various prediction models, The functions of lm and ln are used to find l1, l2,... that meet the maximum value requirements of the formula maxS, The weight values of lm and ln.
2.2.4.2. A Combined Forecasting Model for Dynamic Prediction Effectiveness
In order to further enhance the prediction accuracy of the combination prediction model based on predictive validity, this study proposes an improved dynamic Predictive Effec-tiveness Trend hybrid model (PE_trend). As shown in
Figure 6, we set 5 unit times as one-time step T and input T into the predictive validity hybrid model to calculate the individual model weight values corresponding to this time segment, which are then used to compute the combined model simulation results for the time point after T. (The short-er the set time interval, the higher the accuracy of the model, although it comes at the cost of increased computational time. In our testing, it was observed that setting the time interval to 5 units yielded more favorable simulation results while requiring less compu-tational time.) Subsequently, the initial and final values of period T are incremented by 1, and the combination prediction model is re-input for the calculation to obtain the weighted coefficients of the individual models for the time point after T. This process is iterated until the last time point, thus computing the individual model weight values corresponding to all time points.
2.2.5. Model Evaluation Indicators
This study selects the Nash-Sutcliffe Efficiency (NES) to evaluate model performance. The Nash-Sutcliffe Efficiency is a commonly used index proposed by hydrologist J.R. Nash in 1970. It measures the fit between model simulation results and actual observed values, assessing model simulation performance[
33]. A value of NES closer to 1 indicates a better fit of the model to the observed values. NES is calculated using the following formula:
In the equation, Qm represents the modeled runoff (m3/s), and Qobs represents the observed runoff (m3/s).
The Taylor diagram is a graphical representation method used to compare the correlation, bias, and standard deviation between model outputs and observed data. It shows the performance of different model predictions in terms of correlation, variance, and standard deviation compared to observed data, helping to evaluate the accuracy and bias of the models. The Taylor diagram is typically presented in a polar plot, with the standard deviation and correlation coefficient of the observed values as axes and the standard deviation and correlation coefficient of the model predictions plotted as points, visually comparing the differences between the models and observed data. A Taylor diagram typically displays three evaluation metrics: Correlation Coefficient, Standard Deviation, and Root Mean Square Error (RMSE).
The Correlation Coefficient measures the strength and direction of the linear relationship between model predictions and observed values. It is calculated as:
In the equation, N is the total number of data points, xi and yi are individual data points for the two variables, x and y are the means of the two variables.
The Standard Deviation measures the variability or spread of data points around the mean. In the Taylor diagram, it represents the dispersion of model predictions and observed values. It is calculated as:
In the equation, N is the total number of data points, xi represents each data point, x is the mean value of the data points.
The Root Mean Square Error is a measure of the differences between values predicted by a model and the observed values. It is calculated as:
In the equation, N is the total number of data points, yi represents the observed values, yi represents the model-predicted values.