1. Introduction
The global focus on renewable energy sources has intensified in response to energy crises, air pollution, and concerns over global warming. It is projected that renewable energy will contribute to approximately 40% of global energy consumption by 2030 [
1]. Among renewable energy technologies, solar photovoltaic (PV) power generation stands out for its direct conversion of solar power into electrical energy using solar modules. This method is highly regarded for its environmental benefits, including minimal air, water, and noise pollution, adaptability to local conditions, low installation costs, and potential for grid integration [
2,
3].
Recent data underscores the rapid growth of solar power capacity worldwide. According to Rethink Energy, global installations in the first nine months of 2022 increased by 54GW, reaching a total installed capacity of approximately 142.5GW, with projections indicating an annual installation capacity of 222GW [
4]. Similarly, in the European Union, new PV installations totaled 41.4GW in 2022, bringing the cumulative installed capacity to 208.9GW by the year's end [
5]. China reported a cumulative installed capacity of 396.261GW by the end of 2022, following a significant annual installation of 87.41GW [
6]. In Korea, the implementation of the 'Renewable Energy 3020 Implementation Plan' has substantially increased solar power capacity from 1,362MW in 2017 to 4,658MW in 2020 [
7].
Solar power forecasting has been a focal point of research, particularly in regions with established solar observation infrastructure like Europe and the United States. Early efforts in solar power forecasting leveraged advanced technologies, equipment, and extensive data availability. As global solar power installations continue to expand, so does the advancement in solar power forecasting methodologies. The primary objective of this research is to enhance prediction accuracy through developing and synthesizing diverse forecasting models, addressing economic considerations and enhancing overall performance [
8,
9].
Solar power forecasting methods encompass a spectrum of approaches, including traditional physics and statistics-based models, new machine learning techniques, optimization algorithms, deep learning, and hybrid models [
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34]. These methods cater to different forecasting timeframes: short-term (hours to weeks), medium-term (weeks to months), and long-term (months to years) [
10,
11,
12].
For short-term forecasting, methods such as exponential smoothing, ARIMA, k-nearest neighbors (KNN), decision trees, support vector machines (SVM), particle swarm optimization (PSO), and artificial intelligence (AI) have been applied [
13,
14,
15,
16,
17,
18,
19]. Medium-term forecasting expands into weeks and months, incorporating variables like weather data, events, and economic indicators, utilizing models such as linear regression, neural networks, recurrent neural networks (RNN), long short-term memory (LSTM), convolutional neural networks (CNN), among others [
20,
21,
22,
23,
24,
25]. Long-term forecasting, spanning months to years, necessitates more comprehensive consideration of system elements and utilizes algorithms like time series analysis, ARIMA, and ensemble methods [
26,
27,
28].
Each forecasting algorithm possesses distinct strengths and limitations. For instance, ARIMA excels in short-term predictions but struggles with seasonal and non-stationary data, while LSTM is suitable for capturing long-term dependencies but demands significant computational resources [
26,
27,
28]. Hybrid models, combining multiple algorithms based on data characteristics and forecasting objectives, have emerged as a promising approach in solar power forecasting [
29,
30,
31,
32,
33,
34]. By integrating the strengths of different models, hybrid approaches aim to enhance prediction accuracy and robustness.
This study proposes a hybrid forecasting method for medium- and short-term solar power generation, combining GRU for long-term forecasting and Prophet for seasonality and event handling. The methodology includes data collection, preprocessing, and experimentation with GRU using multivariate datasets derived from Prophet predictions and observed data residuals in 15-minute intervals. We conduct experiments for short-term (2 days and 7 days) and medium-term (15 days and 30 days) power generation predictions.
The structure of this paper is as follows:
Section 2 reviews related research on Prophet and GRU models.
Section 3 identifies limitations in existing Prophet models and proposes solution methods.
Section 4 describes the data collection, the pre-processing method for collected data, the data splitting method for learning and testing, and the proposed method using Prophet and GRU.
Section 5 outlines the methodology and presents experimental results. Finally,
Section 6 concludes with a discussion of the potential impact of the proposed hybrid approach and future research directions.
4. Proposed Methods
4.1. Data Collection
4.1.1. Solar PV System Install Location
Table 1 presents the location and detailed information of the solar power generation system installed in Building W of the Company in Naju, Jeollanam-do, South Korea, to validate the effectiveness of the proposed method. The primary objective of the building is to utilize solar power generation for self-consumption. The total area of the building consists of three wings, totaling 600,983m
2. The building structure is constructed using H-beams, and the exterior walls are made of sandwich panels. The installed photovoltaic (PV) system has a capacity of 50KW, the PCS (Power Conversion System) has a capacity of 100KW, and the battery capacity is 200KW.
4.1.2. Power Generation and Meteorological Data
This study's power generation data was collected from July 1, 2018, to September 30, 2019, at 15-minute intervals from W Company (Petrochemical) located in Naju, Jeollanam-do, South Korea. Korean holiday data, including substitute holidays, election days, and traditional holidays, were generated using the work calendar package [
37]. Additionally, weather data including time, precipitation, temperature, wind direction, wind speed, humidity, and atmospheric pressure for the location in Naju were collected at hourly intervals from the Korea Meteorological Administration's weather data portal (
https://data.kma.go.kr/cmmn/main.do).
The most relevant variables for power generation were selected among the weather data. These variables include precipitation-related parameters such as rain, rain15, and rain day, as well as temperature, wind speed, humidity, sunshine duration, and cloud cover [
38].
4.2. Data Pre-Processing
To align the time intervals of power data (15 minutes) and weather data (hourly), linear interpolation is used to resample the weather data to 15-minute intervals. This ensures that both datasets have matching timestamps. Additionally, any missing values in the power and weather data are replaced with zero values. To minimize the impact of outliers on the analysis, the data scale is normalized using
RobustScaler [
39]. The equation for
RobustScaler normalization is as follows:
In Equation (3), x is the original data value, is the normalized data value, Q1(x) is the first quartile (25th percentile) of the data, and Q3(x) is the third quartile (75th percentile) of the data. By using Robust Scaler, we ensure that the data is normalized while being less sensitive to the influence of outliers.
4.3. Data Partition for Training and Test Data
The training data, spanning from July 1, 2018, to June 30, 2019, and the test data, spanning from July 1, 2019, to September 30, 2019, were divided into 80% and 20% respectively for the experiments. The data is divided into an 80% (365 days * 24 hours * 15 minutes) portion for training and a 20% (92 days * 24 hours * 15 minutes) portion for testing to conduct the experiments. Time series data is sensitive to temporal order, so shuffling the data randomly can lead to the model making inaccurate predictions of future data. Therefore, in this study, we did not apply cross-validation.
4.4. Modified Prophet Model
The experimental training data used in this study consists of power generation data collected from July 1, 2018, to June 30, 2019. During this period, holidays and alternative holidays (such as election days and traditional holidays) were considered to enhance the accuracy of power generation predictions. Specifically, when applying the Prophet model, various parameters such as trend (
change_point_prior_scale = 0.01), holiday effect (
holiday_prior_scale = 0.25), yearly seasonality (
yearly_season_ability = 10), and adding to meteorological data (‘rain’, ‘’rain15’, rainday’, ‘temp’, ‘wind’, and ‘atm’) were tuned through simulations. The simulation results are shown in
Figure 7 and
Figure 8.
Table 2.
Proposed prophet model parameter setting.
Table 2.
Proposed prophet model parameter setting.
Parameter Nature |
Parameter Name |
Value |
Trend Parameters |
growth |
linear |
changepoints |
None |
n_changepoints |
25 |
changepoint_range |
0.8 |
changepoint_prior_scale |
0.01 |
Seasonality parameters |
yearly_seasonality |
10 |
weekly_seasonality |
False |
daily_seasonality |
False |
seasonality_mode |
multiplicative |
seasonality_prior_scale |
10 |
holidays parameters |
Holidays holidays_prior_scale
|
df 0.25
|
Flow parameters |
flow |
flow |
flow_prior_scale |
10 |
Other parameters |
mcmc_samples |
0 |
interval_width |
0.8 |
Figure 7 and
Figure 8 show stable predictions compared to
Figure 2 and
Figure 3, which are the results of the existing prophet, respectively. In
Figure 2, the prediction direction is not constant and shows increase and decrease in various directions, whereas in
Figure 3, the prediction direction is constant.
4.5. Proposed Hybrid Model
To reduce the error between Prophet's prediction and the actual measurement as shown in
Figure 6, we propose reducing the error rate of power generation prediction by applying the residuals of the long-term trend (yearly and monthly) between predicted and actual values as shown in Equation (4).
In Equation (4), the Error is the difference between the predicted values from the original Prophet model (Prophetpred) and the actual values (y).
The Error term, defined as the difference in Equation (4), is utilized as an input for the GRU model, as illustrated in
Figure 9. The input dataset comprises nine variables, including various meteorological parameters: precipitation (rain, rain15, rainy days), temperature, wind speed, humidity, sunshine duration, cloud cover, and an Error term. In
Figure 9,
represents the previous time (
t-1) values of rainfall (rain, rain15, and rain day), temperature, wind speed, humidity, sunshine duration, cloud cover, while
represents the Error term.
The proposed hybrid model's training and testing data are derived from the same dataset. The input layer of the GRU model consists of 9 variables, which include 8 meteorological data variables including rainfall (rain, rain15, and rain day), temperature, wind speed, humidity, sunshine duration, cloud cover, and one error variable. The hidden layer of the GRU model contains 9 nodes.
Table 3 shows the training and testing options for simulating GRU. The initial learning rate is set to 0.005, and the maximum number of iterations is 500. The mean square error (MSE) for the loss function, ADAM [
40] for the optimizer, and ReLU [
41] for the activity function are employed during both training and testing.
6. Conclusion
The proposed hybrid model exhibits a marked improvement in the accuracy of solar power generation forecasts, which is essential for the strategic planning, management, and operation of power systems. This model supports the maintenance of a continuous and sustainable energy supply while enhancing the operational efficiency of renewable energy systems and power markets. By integrating the Prophet model's capacity to handle long-term trends and seasonality with the GRU's ability to capture complex, non-linear patterns in meteorological data, this study presents a robust algorithm for forecasting solar power generation over both short- and medium-term periods. The comprehensive evaluation underscores the hybrid model's superiority over existing models, positioning it as a reliable tool for advancing the accuracy and reliability of solar power generation forecasts.
Looking ahead, future research could focus on extending the model’s capabilities to real-time or near-real-time forecasting. Such an advancement would allow for more dynamic and responsive predictions, which would be especially valuable for grid operators in need of real-time data for load balancing and resource optimization. Additionally, integrating more advanced deep learning models and exploring the model’s scalability across different geographical regions and climatic conditions would further enhance its applicability. Finally, addressing the computational efficiency of the hybrid model could enable its use in resource-constrained environments, broadening its practical relevance.
Figure 1.
Power generation data of Company W collected from July 1, 2018, to Nov. 30, 2019.
Figure 1.
Power generation data of Company W collected from July 1, 2018, to Nov. 30, 2019.
Figure 2.
Simulation using the prophet model considering the holiday effect for one year (July 2018 to June 2019).
Figure 2.
Simulation using the prophet model considering the holiday effect for one year (July 2018 to June 2019).
Figure 3.
Trend, holidays, weekly, and daily prediction analysis of prophet model. The blue line shows the trend the model fits from the test data, and the light blue shade shows the predicted trend.
Figure 3.
Trend, holidays, weekly, and daily prediction analysis of prophet model. The blue line shows the trend the model fits from the test data, and the light blue shade shows the predicted trend.
Figure 4.
Structure of GRU.
Figure 4.
Structure of GRU.
Figure 5.
Comparison of performance of the prophet model (yhat) and actual observed value (y) from July to September 2019.
Figure 5.
Comparison of performance of the prophet model (yhat) and actual observed value (y) from July to September 2019.
Figure 6.
Performance comparison between observed value (y) and prophet model (prophet) on July 2, 2019.
Figure 6.
Performance comparison between observed value (y) and prophet model (prophet) on July 2, 2019.
Figure 7.
Error forecast of the proposed Prophet model. The black dots represent the historical input data, the blue line represents the predicted trend line after model fitting, and the light blue area above and below the blue curve represents the confidence interval.
Figure 7.
Error forecast of the proposed Prophet model. The black dots represent the historical input data, the blue line represents the predicted trend line after model fitting, and the light blue area above and below the blue curve represents the confidence interval.
Figure 8.
Trend, holidays, weekly, and daily prediction analysis of the proposed prophet model. The blue line shows the trend the model fits from the test data, and the light blue shade shows the predicted trend.
Figure 8.
Trend, holidays, weekly, and daily prediction analysis of the proposed prophet model. The blue line shows the trend the model fits from the test data, and the light blue shade shows the predicted trend.
Figure 9.
Proposed hybrid model.
Figure 9.
Proposed hybrid model.
Figure 10.
Comparison of performance between observed values (y) and other methods for 1 September 2019 (one day).
Figure 10.
Comparison of performance between observed values (y) and other methods for 1 September 2019 (one day).
Figure 11.
Comparison of performance between observed values (y) and other methods from 1 July to 2 July 2019 (2 days). The area “E” shows the largest prediction error in ‘GRU using multivariate’.
Figure 11.
Comparison of performance between observed values (y) and other methods from 1 July to 2 July 2019 (2 days). The area “E” shows the largest prediction error in ‘GRU using multivariate’.
Figure 12.
Comparison of performance between observed values (y) and other methods from 1 October to 7 October 2019 (7 days). The area “E” shows the largest prediction error in ‘GRU using multivariate’.
Figure 12.
Comparison of performance between observed values (y) and other methods from 1 October to 7 October 2019 (7 days). The area “E” shows the largest prediction error in ‘GRU using multivariate’.
Figure 13.
Comparison of performance between observed values (y) and other methods from 15 August to 30 August 2019 (15 days). The area “E” shows the largest prediction error in ‘GRU using multivariate’.
Figure 13.
Comparison of performance between observed values (y) and other methods from 15 August to 30 August 2019 (15 days). The area “E” shows the largest prediction error in ‘GRU using multivariate’.
Figure 14.
Comparison of performance between observed values (y) and other methods from 1 September to 30 September 2019 (30 days). The area “E” shows the largest prediction error in ‘GRU using multivariate’.
Figure 14.
Comparison of performance between observed values (y) and other methods from 1 September to 30 September 2019 (30 days). The area “E” shows the largest prediction error in ‘GRU using multivariate’.
Table 1.
Solar power generation system installation location and information.
Table 1.
Solar power generation system installation location and information.
Division |
Information |
Location |
Naju, Jeollanam-do, South Korea |
The main purpose of usage |
Self-generated solar power generation |
Building area |
Total 3 building, 600,983m2
|
Number of floors |
1st floor of the factory building and one other building |
Building structure |
H-beam |
Outer wall |
Sandwich panels |
PV system capacity |
50KW |
ESS |
PCS: 100KW, battery: 200KW |
Table 3.
Training and testing option by proposed hybrid model.
Table 3.
Training and testing option by proposed hybrid model.
Parameter |
GRU |
Number of layers |
9 |
Number of neurons |
9 |
Number of epochs |
500 |
Learning rate |
0.005 |
Loss function |
MSE |
Optimization |
ADAM |
Weight initializer |
1 |
Activation function |
ReLU |
Table 4.
Comparison of the performance of modified Prophet, GRU, and proposed method.
Table 4.
Comparison of the performance of modified Prophet, GRU, and proposed method.
Term |
Metrics |
Modified Prophet |
GRU using multivariate |
Proposed |
Short- term |
2 days (July 1~2) |
CC |
0.39 |
0.94 |
0.95 |
RMSE |
5765.38 |
1660.25 |
1588.59 |
RMESE |
36732.30 |
10577.77 |
10121.23 |
SMAPE (%) |
189.47 |
187.43 |
186.45 |
7 days (Aug. 1~7) |
CC |
0.69 |
0.95 |
0.95 |
RMSE |
6393.15 |
2521.99 |
2510.73 |
RMSSE |
76202.52 |
30060.66 |
29926.40 |
SMAPE (%) |
169.36 |
160.89 |
160.38 |
Medium- term |
15 days ( Aug. 16~30) |
CC |
0.47 |
0.96 |
0.96 |
RMSE |
6141.05 |
1822.63 |
1756.73 |
RMSSE |
110664.40 |
32844.71 |
31657.08 |
SMAPE (%) |
177.34 |
170.66 |
170.62 |
30 days (Sep. 1~30) |
CC |
0.67 |
0.96 |
0.96 |
RMSE |
6601.82 |
2272.38 |
2227.31 |
RMSSE |
161311.70 |
55524.48 |
54423.11 |
SMAPE (%) |
158.2 |
146.84 |
146.70 |