4.2. Photovoltaic Power Influencing Factors and Correlation Analysis
Pearson correlation coefficient (PCC) analysis method is employed to calculate the correlation coefficients between each factor and PV power output. The results indicate that the global radiation has the highest correlation coefficient, while direct radiation, humidity and temperature have relatively lower correlation coefficients, and wind speed has the smallest correlation coefficient.
and are the average values of the elements in x and y, respectively. After computing all the data sample through Equation (21), we can get the variable correlation table, where Rg, Rd, H, T, W and Wd represent global radiation, direct radiation, humidity, temperature, wind speed and wind direction.
Table 1. depicts the PCC values of variables. The larger the absolute value of PCC indicates a stronger association. In this paper, meteorological variables with PCC values greater than 0.4 with load are screened as input variables for CNN to reduce the redundancy of inputs and to lay the foundation for improving the prediction accuracy.
From this table, it can be seen that there are 4 elements (global irradiation, direct irradiation, humidity, temperature) that have a strong correlation with power. Therefore, the dimension of the input sequence is 4.
Because the input sequence contains the information of multiple moments before the prediction point. The computing time and memory consumption will increase dramatically if the length of input sequence is too long. Therefore, the length of the input sequence is of great significance for this experiment. In this paper, we use the autocorrelation coefficient to determine the length of the input sequence. The formula for the autocorrelation coefficient with delay h is as follows:
In the formula, represent for the historical power Sequence, represent for the power sequence with a time lag of h * 5min.
According to
Table 2, we can see that the correlation decreases gradually with the increase of time delay h. Based on the previous analysis, the input sequence length of 12 is suitable. Each input data is 12 groups of 4-dimensional data before the power point to be predicted.
4.7. Comparison and Analysis of the Results
To validate the effectiveness of the Conv-LSTM-ATT model, this study selects one day each of sunny, cloudy, and rainy weather from the three types of weather clustered as the test set for prediction. At the same time, we introduce three deep learning models (LSTM, Bi-LSTM, Conv-LSTM) as benchmarks for comparison. The model proposed in this study and several baseline models are compared in experiments under the same dataset, that is, only the historical data of the target site is used, and then their prediction results are analyzed and compared.
Figure 9,
Figure 10 and
Figure 11 show the prediction results of four different models under different weather conditions, and
Table 9,
Table 10, and
Table 11 respectively show the prediction errors of the centralized models under different weather conditions. In comparing the data from the three tables, it is evident that the model proposed in this article achieves the lowest RMSE values for sunny, cloudy, and rainy weather conditions, which are 0.1636, 0.2358, and 0.2421, respectively, when compared to other models.
Table 9 shows that in sunny conditions, the photovoltaic output power fluctuates slightly, and the power curve changes relatively smoothly. Several models can predict the trend of photovoltaic output power. The evaluation indicators R2 of the LSTM, Bi-LSTM, and Conv-LSTM prediction models are 0.933, 0.942, and 0.951, respectively, and the evaluation indicator R2 of the model proposed in this article is 0.973, which is higher than the other models, and the effect is the best.
Table 10 shows that in cloudy conditions, the continuous movement of clouds causes the solar radiation intensity received by the photovoltaic components to change continuously, leading to large fluctuations in the fitting curve of the predicted and actual values of photovoltaic output power.
Table 11 shows that in rainy and snowy weather, the RMSEs of the LSTM, Bi-LSTM, and Conv-LSTM prediction models are 0.3226, 0.3218, and 0.2886, respectively, and the RMSE of the model proposed in this article is 0.2421, which is lower than the other three prediction models. The above analysis indicates that the model proposed in this article has more outstanding prediction effects under three types of weather conditions.
Compared to the LSTM model, the model proposed in this paper has reduced the RMSE by 18.28%, 27.99%, and 24.95% in sunny, cloudy, and rainy weather, respectively, the MAPE has been reduced by 36.76%, 45.26% and 41.73% respectively and the MAE has been reduced by 24.97%, 27.10%, and 16.53%, respectively. Compared to the Bi-LSTM model, the proposed model has reduced the RMSE by 13.02%, 22.86%, and 24.76% in sunny, cloudy, and rainy weather, respectively, the MAPE has been reduced by 20.86%, 33.62%, 34.28%, respectively and the MAE has been reduced by 16.03%, 19.37%, and 12.14%, respectively. Compared to the Conv-LSTM model, the proposed model has reduced the RMSE by 10.84%, 14.28%, and 16.11% in sunny, cloudy, and rainy weather, respectively, the MAPE has been reduced by 8.76%, 7.23%, 15.98%, respectively and the MAE has been reduced by 13.07%, 15.26%, and 6.06%, respectively. The comparison results indicate that the model proposed in this paper effectively combines the advantages of both CNN and LSTM methods, and uses the attention mechanism to compensate for the deficiency of the LSTM model in retaining key information when the input sequence is long, thereby effectively improving the prediction accuracy.
The processing time is crucial for real-time applications, where faster predictions are often desirable. In our experiments, the Conv-LSTM-ATT model shows a slightly higher processing time compared to the other models. This increment in time can be attributed to the complexity of the model, especially due to the integration of the attention mechanism. While it does add to the prediction time, the improvement in prediction accuracy (as shown by the lower MAPE, MAE and RMSE values) could justify this trade-off in contexts where prediction accuracy is more critical than the speed of computation.
In this study, Bayesian optimization was applied to adjust the data fusion ratios in a photovoltaic power prediction model. By setting 100 iterations, using the Expected Improvement (EI) acquisition function to balance exploration and exploitation, and setting the data fusion ratio parameter space from 0% to 100%, the research team comprehensively covered all configurations from no fusion to full fusion. The optimization results revealed the optimal data fusion ratios under different weather conditions as follows: under sunny conditions, 38.72%, 2.36%, 26.83%, and 14.50%; under cloudy conditions, 49.11%, 6.77%, 23.46%, 9.88%, and 17.68%; under snowy/rainy conditions, 30.18%, 12.05%, and 19.45%. These optimized fusion ratios were then applied to the training set data under corresponding weather conditions, followed by evaluation using the Conv-LSTM-ATT prediction model.
The experimental design involved comparing the impact of five different data fusion strategies on prediction performance, including: "No Fusion", using only historical data from the target site; "Uniform Fusion", evenly fusing data from all surrounding stations; "Similarity-Filtered Fusion", evenly fusing data from nearby stations selected based on similarity; "Bayesian Optimized Similarity Fusion", determining the optimal fusion ratios for nearby stations based on similarity through Bayesian optimization; and "Actual Values" as a reference for model prediction accuracy. Experimental results showed that, compared to no fusion and uniform fusion strategies, the similarity-filtered fusion and Bayesian optimized similarity fusion strategies significantly improved prediction accuracy, particularly the Bayesian optimized similarity fusion, which performed better than other strategies under all test conditions.
These findings indicate that appropriate data fusion strategies can significantly enhance the performance of photovoltaic power prediction models, and Bayesian optimization serves as a powerful tool to effectively implement these strategies, especially in environments requiring high data diversity and complexity.
Figure 12,
Figure 13 and
Figure 14 show the prediction results of the proposed model at different integration ratios, and
Table 12,
Table 13 and
Table 14 show the prediction errors of the proposed model at different integration ratios. Through the experiment, we can draw the following conclusions.
Significant Reduction in Error Metrics: The introduction of more data from neighboring stations significantly reduced error metrics such as RMSE and MAE. By applying Bayesian optimization to determine the optimal fusion ratios of data from nearby stations based on similarity, the RMSE decreased by 20.04%, 28.24%, and 30.94% under sunny, cloudy, and rainy conditions respectively, and MAPE decreased by 30.30%, 18.83%, and 29.27%. Similarly, MAE also decreased by 23.07%, 17.58%, and 31.36% under these weather conditions. These reductions emphasize that the model's ability to predict PV power output is enhanced when supported with more extensive spatial data.
Variability in Prediction Accuracy Across Weather Conditions: The improvement in prediction accuracy varies across different weather conditions. Particularly during rainy conditions, because more data from surrounding areas were integrated, compensating for the lack of historical data at the target site, the reduction in prediction error was the greatest, reaching 31.36%. This shows that the model especially benefits from additional data where there is a deficiency, enhancing its accuracy.
Improvement in R2 Value and the Trade-off with Time: As more data is integrated, the model's R2 value improves, indicating a stronger correlation between predicted and actual values. However, this accuracy comes at the cost of increased computational time, especially as the degree of data integration increases, leading to longer prediction times.
In predicting photovoltaic power, the Conv-LSTM-ATT model that integrates spatial data from surrounding stations exhibits excellent performance. This strategy effectively utilizes diverse data sources, enhancing the model's predictive accuracy across various weather conditions, and proving its practical application potential in real-world PV power forecasting scenarios.