New Decomposition Models for Hourly Direct Normal Irradiance Estimations for Southern Africa

Francisca Muriel Daniel-Durandt; Arnold Johan Rix

doi:10.20944/preprints202403.0416.v1

Submitted:

06 March 2024

Posted:

07 March 2024

You are already at the latest version

Abstract

The research develops and validates new decomposition models for DNI estimations from Southern African data. The results demonstrated improved DNI estimation accuracy compared to the baseline models across all testing and validation datasets. These outcomes suggest that utilising a localised model can significantly enhance DNI estimations for Southern Africa and potentially for developing similar models in diverse geographic regions worldwide. Furthermore, clustered models highlighted the potential advantages of grouping data based on shared geographical and climatic attributes. This clustering approach could enhance decomposition model performance, particularly when local data is limited or data is available from multiple nearby stations. The Southern African decomposition model, which encompasses a wide spectrum of climatic regions and geographic locations, exhibited notable improvements over the baseline models despite occasional overestimation or underestimation. The overall metrics affirm the substantial advancement achieved with the Southern African model. This study focused on validating the model for hourly DNI in Southern Africa within a range of clearness index-intervals from 0.175 to 0.875. Implementing accurate decomposition models in developing countries can accelerate the adoption of renewable energy sources, diminishing reliance on coal and fossil fuels.

Keywords:

Decomposition Model

;

Global Horizontal Irradiance

;

Direct Normal Irradiance

;

Solar Radiation Model

Subject:

Engineering - Electrical and Electronic Engineering

1. Introduction

Photovoltaic (PV) systems require accurate modelling and monitoring to ensure their profitability. The amount of irradiance at the site, the GPI, is the foundation of designing, modelling and monitoring PV systems. The global plane-of-array irradiance (GPI) comprises the plane-of-array’s (POA) direct beam, ground and diffuse irradiance components. GPI is used to model and monitor PV systems, as this shows the amount of generated solar power and, therefore, one of the most important contributing factors to designing a PV system. The global horizontal irradiance (GHI), direct normal irradiance (DNI) and diffuse horizontal irradiance (DHI) components are required to calculate these irradiance components.

Irradiance components with a transposition model calculate GPI (

G_{P O A}

) as

G_{P O A} = G_{B C} + G_{D C} + G_{R C} .

(1)

G_{B C}

is the direct beam irradiance,

G_{R C}

is the ground-reflected irradiance, and

G_{D C}

is the diffuse irradiance component in the POA. GHI, DNI and DHI components are required to calculate

G_{B C}

,

G_{D C}

and

G_{R C}

. The sum of the DNI projected onto the horizontal surface using the cosine of the solar zenith angle

θ_{Z}

, and DHI gives the GHI, shown in Figure 1, [1]:

G H I = D N I \cdot cos θ_{Z} + D H I .

(2)

GHI, DHI, and DNI units are in

W / m^{2}

.

Most ground-based stations have at least measurements of GHI. Other measurements include radiometric data such as DNI, DHI and ultra-violet, and meteorologic data such as the temperature, pressure, rainfall, relative humidity, wind direction and wind speed. Pyranometers measure DHI and GHI, and the pyrheliometer measures DNI.

GHI is measured with a hemispherical view and is mounted horizontally. Similar in setup to other pyranometers, the DHI pyranometer includes the additional feature of being shaded from direct sunlight. The pyrheliometer has a narrow view that only measures the beam directly from the Sun and is usually a Sun tracker for increased accuracy [2]. The irradiance measurements are converted to

W / m^{2}

and logged accordingly.

Calibrating the equipment to the ISO 9060:1990 standard is necessary, and it is advisable to undergo recalibration every two years to ensure the reliability of measurements. The maintenance required is to clean the domes and regularly check and replace the desiccant, which keeps the instruments dry internally.

GHI, DNI and DHI are interdependent; therefore, having only two irradiance measurements is sufficient to estimate the third using the decomposition models (also sometimes called separation models) [3]. If only the GHI is available, the DNI and DHI also are estimated using the decomposition models. The transposition models calculate GPI using the irradiance components. Therefore, GHI, DHI and DNI correlations are usually empirically expressed as a decomposition model [4].

Indices are relationships between different irradiance components. Decomposition and transposition models utilise these relationships.

The definition of the direct beam transmittance

K_{n}

and diffuse transmittance

K_{d}

is

K_{n} = \frac{D N I}{G_{0 n}},

(3)

K_{d} = \frac{D H I}{G H I} .

(4)

Liu and Jordan defined the

K_{t}

as

K_{t} = \frac{G H I}{G_{0 n} cos θ_{Z}} .

(5)

All K-values (

K_{t}

,

K_{n}

and

K_{d}

) are unitless.

The extraterrestrial irradiance on a normal surface

G_{0 n}

depends on the day of the year

G_{0 n} = (Solar Constant) (1 + 0.033 \cdot cos (\frac{360 \cdot n}{365})) .

(6)

The Solar Constant is usually 1,367

W / m^{2}

.

Determining the horizontal extraterrestrial irradiance

G_{0 h}

involves multiplying it by the cosine of

θ_{Z}

as expressed in Equation (7):

G_{0 h} = G_{0 n} \cdot cos θ_{Z} .

(7)

Multipredictor decomposition models can improve accuracy compared to single predictor models [6]. However, the disadvantage is that multiple measurements must be available, which is not always the case for developing countries or brand-new sites of PV installations.

Boland et al. and Ridley et al. developed a logistical model to estimate solar diffuse radiation [7,8]. Soares et al., Talvitie et al., and Kalyanam and Hoffmann have proposed machine-learning-based models to predict solar diffuse and direct components [9,10,11]. Bessafi et al. have proposed a satellite-based decomposition model as an alternative to ground-based measurements [12], and Janjai et al. have proposed statistical models for estimating diffuse radiation [13].

Decomposition models have been developed by assessing previous models and improving the accuracy of these estimations. As more data and measurements become available, researchers have the opportunity to develop models for different climates and temporal resolutions. Most models predominantly use

K_{t}

. Some of the variables used in the decomposition models are the solar altitude angle

β

and dew point temperature

T_{d}

. Using

K_{t}

as the main predictor in decomposition models is popular because of its simplicity and applicability [6].

Orgill and Hollands developed a relationship between the

K_{t}

and

K_{d}

[14], and Erbs et al. extended the

K_{t}

-

K_{d}

relationship to latitudes from 31 to 42^∘ North [15]. Louche et al. established a GHI and DNI relationship for a Mediterranean site to estimate

K_{n}

using

K_{t}

[16].

The Direct Insolation Simulation Code (DISC) was developed by Maxwell [17], and Perez et al. developed the Dirint model with the hopes of increasing the performance of the DISC model [18]. The Dirint model of Perez et al. has shown superior performance when estimating the DNI [19].

In Korea, Lee et al. developed a model using 6 Korean locations [20], and Lee et al. developed a new model using Maxwell’s DISC model by refitting the coefficients [3]. Skartveit and Olseth developed a DNI estimation model using the solar elevation angle for Norway based on hourly GHI and DHI records [21].

Lam and Li derived

K_{d}

for Hong Kong [22]. Reindl et al. determined

K_{d}

using two models with

K_{t}

and

β

[23].

The main limitations of decomposition models are that some have limited climate scope, and the dataset’s temporal resolution affects the irradiance estimation accuracy. A decomposition model in a tropical climate may be unsuitable for a desert climate and vice versa. Intra-hourly-based models perform differently from daily- or monthly-based models, which is why many available decomposition models exist.

Several regions, such as Belgium [4], China [24], the USA [19], and North Africa [25], evaluated the accuracy of decomposition models.

Gueymard and Ruiz-Arias provided an extensive study of 140 available decomposition models. The authors state that the predicted DNI’s accuracy highly depends on the decomposition model. Validation studies exist but are limited to a few models and test stations, i.e. biased to a specific location or climate [26]. Research indicates that no decomposition model has been developed and validated for South Africa.

Laiti et al. state that, in general, decomposition models tend to overestimate DHI and underestimate DNI and typically, models tend to underestimate DHI in overcast periods and overestimate during clear-sky periods [19].

Higher resolution data include higher

K_{t}

values, resulting in extreme overestimations of DNI. These hourly DNI estimates have higher accuracy than 1-minute DNI estimates. Subhourly estimations would be highly beneficial for real-time monitoring and forecasting of solar power [26].

Figure 2 visualises the testing and validation countries of common decomposition models in green of models such as Orgill and Hollands, Erbs et al., Louche et al., Reindl et al., DISC (Maxwell), Dirint (Perez et al.), Lee et al., Lee et al., Skartveit and Olseth and Lam and Li) [3,14,15,16,17,18,20,21,22,23].

The development of the decomposition model in South America includes Brazil [28], Argentina and Brazil [29]. Northern African models include Nigeria [30], Algeria [31] and Morocco [32].

Engerer developed a model for Australia and observed that the model only slightly outperformed the Dirint model [33]. The BRL model by Ridley et al. developed a method to construct multiple variable logistic models for the diffuse solar fraction, which includes Mozambique [8]. Figure 2 represents these discussed models [8,28,29,30,31,32,33] in red.

South African research on decomposition models includes the following: Tsubo and Walker published the only Southern African-based study on the relationship between radiation and

K_{t}

[34]. However, this relationship is with photosynthetically active radiation related to agricultural practices, not PV systems. Clear-sky model assessments and validation studies have been performed by [35] and [36] for Southern African countries. Clear-sky models simplify atmospheric attenuation to estimate solar irradiance under clear-sky conditions and do not represent decomposition models and is not include these studies as comparison models, as they are irrelevant to the research.

Mahachi’s thesis assessed decomposition and transposition models in South Africa and showed that the models tend to overestimate the DHI but underestimate the DNI [37]. Furthermore, the DISC and Dirint decomposition models showed the most accurate estimations of the DNI and DHI for the South African climatic conditions [38].

As discussed, decomposition models are empirical relationships between GHI, DHI and DNI. All three irradiance components are required to estimate GPI. Decomposition models are useful as it reduces the measurement equipment by decomposing one irradiance component into two other; for example, use GHI to estimate DHI and DNI.

Most decomposition models are not universally applicable and localised to a specific climate, and the temporal resolution is not always transferable. There has not been extensive literature published representing the Southern African region in decomposition models, which this research article will attempt to address.

2. Model Development

The methodology to develop a novel decomposition model is based on selected data from the automated QC procedure and addresses three geographical models:

a localised decomposition model, which is site-specific;
a clustered decomposition model, which encapsulates several sites to group an area based on their geographical location;
and a regional (Southern African) model, which encapsulates the data from the SAURAN network for developing a model specific to Southern Africa.

2.1. SAURAN Database

Table 1 summarises the SAURAN stations’ corresponding geographical information, such as latitude, longitude, and elevation above sea level.

Table 3 shows the data points available for the model development, taken from Table 2. Further, the data points assessed are

K_{t}

between 0.175 and 0.875.

The data points are hourly measurements of the GHI, DNI and DHI. The split of the train-validation-test datasets is 50:25:25, with the exceptions of two datasets, ILA and MIN. The ILA and MIN have a 0:0:100 data split and are two unknown datasets as part of the test study.

Table 3 also shows each station’s mean GHI, DNI, and DHI determined after applying the QC procedure.

Table 2. SAURAN database and dataset sizes from [39].

Station	Dataset size		Start Date	End Date
Station	Before QC	After QC	Start Date	End Date
CSIR	46,434	26,539	11 March 2017	31 October 2022
CUT	28,077	14,619	24 October 2017	31 October 2022
FRH	40,895	22,233	7 February 2017	24 February 2022
GRT	18,541	9774	27 November 2013	24 January 2016
HLO	21,532	11,728	8 October 2015	27 October 2020
ILA	8832	4676	13 October 2021	31 October 2022
KZH	52,323	38,898	7 December 2015	07 August 2022
KZW	20,291	10,756	7 December 2015	12 December 2018
MIN	8185	4423	28 October 2021	31 October 2022
MRB	4201	2462	17 March 2017	22 October 2019
NMU	39,969	23,130	10 December 2015	30 September 2022
NUST	52,004	27,401	26 July 2016	31 October 2022
PMB	9773	5415	13 July 2021	31 October 2022
RVD	63,716	34,457	27 March 2014	28 July 2021
SALT	14,151	9908	21 July 2017	22 December 2020
STA	40,256	21,751	7 December 2015	19 April 2021
SUN	87,720	47,733	24 May 2010	31 October 2022
SUT	1715	902	8 February 2017	20 April 2017
UBG	38,917	20,646	26 November 2014	6 November 2020
UFS	31,665	17,152	16 January 2014	30 August 2017
UNV	59,100	33,144	23 April 2015	31 October 2022
UNZ	56,399	30,373	11 July 2014	31 October 2022
UPR	78,792	42,128	19 September 2013	31 October 2022
VAN	24,701	13,234	26 August 2016	10 July 2019

Table 3. Model development stations indicating the mean GHI, DNI and GHI and sizes of training, validation and testing sets.

Station	Mean¹			Dataset²				Cluster Allocation
Station	GHI [ $W / m^{2}$ ]	DNI [ $W / m^{2}$ ]	DHI [ $W / m^{2}$ ]	Total	Train	Validation	Test	Cluster Allocation
CSIR	575	599	167	14,991	7,495	3,748	3,748	2
CUT	609	639	159	9,161	4,580	2,290	2,291	2
FRH	544	583	151	12,224	6,112	3,056	3,056	4
GRT	573	624	151	5,788	2,894	1,447	1,447	4
HLO	550	608	138	7,061	3,530	1,765	1,766	1
ILA	589	680	131	2,709	0	0	2,709	1
KZH	533	517	179	8,782	4,391	2,195	2,196	3
KZW	531	511	184	5,945	2,972	1,486	1,487	3
NMU	556	545	165	10,562	5,281	2,640	2,641	4
NUST	614	670	149	15,901	7,950	3,975	3,976	1
MIN	564	573	161	2,761	0	0	2,761	2
RVD	630	729	125	19,624	9,812	4,906	4,906	1
SUN	556	645	133	28,508	14,254	7,127	7,127	1
UBG	591	602	158	12,137	6,068	3,034	3,035	2
UFS	567	654	137	10,257	5,128	2,564	2,565	2
UNV	579	524	197	15,874	7,937	3,968	3,969	2
UNZ	530	528	176	10,055	5,027	2,514	2,514	3
UPR	568	609	163	28,089	14,044	7,022	7,023	2
VAN	597	683	126	7,860	3,930	1,965	1,965	1

¹ Daylight average, ² Dataset size after quality control as in [40] and

0.175 \leq K_{t} \leq 0.875

2.2. Comparison Metrics

The comparison metrics are the root mean square error (RMSE), mean absolute error (MAE) and mean bias error (MBE).

\begin{matrix} RMSE & = \sqrt{\sum_{i = 1}^{N} \frac{{(x_{i} - \hat{x_{i}})}^{2}}{n}}, \\ MAE & = \frac{1}{n} \sum_{i = 1}^{n} | x_{i} - \hat{x_{i}} |, \\ MBE & = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \hat{x_{i}}), \end{matrix}

(8)

where

x_{i}

is the measured value, and

\hat{x_{i}}

is the predicted value. A low RMSE and MAE indicate a good model, whereas an MBE should be closer to zero. RMSE indicates the concentration of data around the line of best fit. Therefore, a smaller RMSE is indicative of a more accurate model.

The Pearson correlation coefficient r indicates the correlation between data:

r = \frac{\sum (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum {(x_{i} - \bar{x})}^{2} \sum {(y_{i} - \bar{y})}^{2}}} .

(9)

In Equation (9),

x_{i}

and

y_{i}

represent the individual points with index i and

\bar{x}

and

\bar{y}

represent the mean of the x and y sample set. An r closer to -1 has a negative correlation, meaning if one variable increases, the other decreases. In contrast, if r is closer to 1, it has a positive correlation, meaning if one variable increases, the other would also [41].

Statistical indicators used for the comparison metrics are the MBE, RMSE and MAE, all expressed as a percentage of the mean measured DNI [26] and

R^{2}

. Further comparison metrics are two MAE

K_{t}

-intervals:

K_{t} < 0.60

and

K_{t} \geq 0.60

.

The MBE indicates whether a model over or underestimates the DNI, and the RMSE indicates the deviation of the errors. A significant difference between MAE and RMSE indicates a larger variance in the data. Lower RMSE and MAE are ideal, whereas an MBE closer to zero is optimal. The MAE is an unbiased estimator and also evaluates the two

K_{t}

intervals. Lower and higher

K_{t}

indicate overcast and clear-sky conditions, respectively. Therefore, the two

K_{t}

intervals assess the models under varying weather conditions.

2.3. Regression and Fitting

The relationship between two variables is quantified using statistical methods like regression. Regression techniques can be linear, multi-linear and non-linear.

The definition of a linear relationship is

y = b_{0} + b_{1} x,

(10)

where y is the response, x is the regressor,

b_{0}

is the intercept, and

b_{0}

is the slope. A regression analysis quantifies the strength of a relationship between y and x [41].

The least squares method estimates

b_{0}

and

b_{1}

so that the sum of the squares of the residuals is at a minimum. The residual sum of squares is denoted as

SSE

and is the sum of squares of the errors about the regression line. Thus, the minimisation of

SSE = \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2},

(11)

where

\hat{y}

denotes the predicted or fitted value.

The coefficient of determination,

R^{2}

, indicates how good the fit of a model is and is a number between zero and one.

R^{2} = 1 - \frac{\sum {(y_{i} - \hat{y_{i}})}^{2}}{{(y_{i} - \bar{y_{i}})}^{2}} .

(12)

A higher

R^{2}

-value indicates that the model explains the variation in the response variable around its mean, and the regression model fits the observation better [41].

Polynomial regression is the modelling of a dependent, y, as an

n^{t h}

-degree polynomial of x

y = b_{o} + b_{1} x + b_{2} x^{2} + \dots + b_{r} x^{2} .

(13)

Exponential regression is where the best fit of an equation is an exponential function, like

y = a + b c^{x},

(14)

or

y = a + b e^{c x} .

(15)

Multi-linear regression has multiple variables, which is the outcome of a response variable

y = β_{0} + b_{1} x_{1 i} + b_{2} x_{2 i} + \dots + b_{k} x_{k i} .

(16)

2.4. Software Development Tools

The model development utilises a combination of data science applications and modelling. The primary tool is the open-source language Python with the anaconda interface [42], and various available libraries [43,44,45].

2.5. Baseline Models

Three comparative models are used as a baseline to compare the new models. Based on the literature, the DISC and Dirint models performed well for Southern African climates [38,46].

The Dirint [18] and Lee [3] models are also used for comparison because their foundation is similar to the DISC model [17].

Maxwell’s DISC quasi-physical approach has three assumptions [3]:

The relative air mass $A M$ is the dominant parameter affecting the relationship between $K_{n}$ and $K_{t}$ ;
The physical model used to calculate $K_{n}$ will provide a physically-based reference from which the changes in $K_{n}$ can be calculated (see Equation (20) below);
Seasonal, annual and climate variations in the relationship between $K_{n}$ and $K_{t}$ are fully accounted for by parametric functions in $K_{t}$ that relate $Δ K_{n}$ to $A M$ , cloud cover and PW vapour.

A M

is defined as [47]

A M = {[cos θ_{Z} + 0.5057 \cdot {(96.080 - θ_{Z})}^{- 1.634}]}^{- 1} .

(17)

The absolute AM (

A M_{a}

) is the pressurised normalisation of AM, expressed as

A M_{a} = AM (\frac{P}{P_{o}}),

(18)

where P refers to the atmospheric pressure at the test site, and

P_{o}

is the atmospheric pressure at sea level.

The modelled DNI is determined using Equation (3):

D N I = G_{0 n} \cdot K_{n},

(19)

where

K_{n} = K_{n c} - Δ K_{n},

(20)

and

Δ K_{n} = a_{DISC} + b_{DISC} e^{c_{DISC} \cdot A M} .

(21)

The clear-sky limit

K_{n c}

is a polynomial in

A M

:

\begin{matrix} K_{n c} & = 0.866 - 0.122 A M + 0.0121 A M^{2} - 0.00653 A M^{3} + 0.000014 A M^{4} . \end{matrix}

(22)

Two

K_{t}

intervals determine the coefficients

a_{DISC}, b_{DISC}

and

c_{DISC}

:

K_{t} \leq 0.60

and

K_{t} > 0.60

.

For

K_{t} \leq 0.60

\begin{matrix} a_{DISC} & = 0.512 - 1.56 K_{t} + 2.286 K_{t}^{2} - 2.222 K_{t}^{3}, \\ b_{DISC} & = 0.370 + 0.962 K_{t}^{3}, \\ c_{DISC} & = - 0.280 + 0.932 K_{t} - 2.048 K_{t}^{2} . \end{matrix}

(23)

For

K_{t} > 0.60

\begin{matrix} a_{DISC} & = - 5.743 + 21.77 K_{t} - 27.49 K_{t}^{2} + 11.56 K_{t}^{3}, \\ b_{DISC} & = 41.4 - 118.5 K_{t} + 66.05 K_{t}^{2} + 31.90 K_{t}^{3}, \\ c_{DISC} & = - 47.01 + 184.2 K_{t} - 222.0 K_{t}^{2} + 73.81 K_{t}^{3} . \end{matrix}

(24)

Maxwell’s model possesses a different functional form because the quasi-physical approach is applied; therefore, it partially reflects the physics involved in the atmospheric transmission of solar radiation [3]. The

a_{DISC}

,

b_{DISC}

and

c_{DISC}

parameters were fitted based on solar radiation data from Atlanta, Georgia, USA, 1981 [17]. Maxwell adopted the Bird clear-sky model for

K_{n c}

(see Equation (22)). The parameters

a_{DISC}

,

b_{DISC}

and

c_{DISC}

, as described in Equations (23) and (24), were then fitted based on the dataset.

The DISC model, termed ‘quasi-physical’, combines a clear-sky model with experimental fits for other sky conditions. The model is a clear-sky irradiance attenuated by a function of

K_{t}

. Maxwell derived the empirical regressions from 12 years of recorded radiation data at 70 stations [4,17].

The Dirint model is based on the DISC model and was developed by Perez et al. [18]. The goal was to improve the accuracy of the DISC model by Maxwell [17].

The Dirint model uses a clearness index variation parameter

K_{t}^{'}

:

K_{t}^{'} = \frac{K_{t}}{1.031 e^{- 1.4 / (0.9 + 9.4 / A M)} + 0.1} .

(25)

Furthermore, a stability index parameter

Δ K_{t}^{'}

:

Δ K_{t}^{'} = 0.5 (| K_{t (i)}^{'} - K_{t (i + 1)}^{'} | + | K_{t (i)}^{'} - K_{t (i - 1)}^{'} |),

(26)

considers the previous (

i - 1

), current (i) and next hourly (

i + 1

) record. When the preceding or hourly record is missing,

Δ K_{t}^{'}

is

Δ K_{t}^{'} = | K_{t (i)} - K_{t (i \pm 1)} | .

(27)

A low

Δ K_{t}^{'}

is a stable condition, whereas a high

Δ K_{t}^{'}

characterises unstable conditions, which allows the distinction between hazy and partly cloudy conditions. The

T_{d}

is an adequate atmospheric PW estimator [18]. The Dirint model’s atmospheric PW (W) is estimated using:

W = exp (0.07 \cdot T_{d} - 0.075) .

(28)

The Dirint is a four-dimension conditional model, having the

θ_{Z}

,

K_{t}

,

Δ K_{t}^{'}

and W. Based on the four-dimensional model, the calculation of hourly DNI is

D N I = \frac{I_{0 n} \cdot k_{b}^{'} e^{- 1.4 / (0.9 + 9.4 / A M)}}{0.87291} .

(29)

where

\begin{matrix} k_{b}^{'} & = 0 for K_{t}^{'} < 0.2, \\ k_{b}^{'} & = a_{Dirint} K_{t}^{'} + b_{Dirint} . \end{matrix}

(30)

Coefficients

a_{Dirint}

and

b_{Dirint}

are from a complex lookup table.

Lee et al. created a new model for Korea with the same format as Maxwell’s DISC model.

\begin{matrix} a_{Lee} & = 0.342 - 0.3782 K_{t}, \\ b_{Lee} & = 0.5329 + 0.2676 K_{t} - 0.0216 k_{t}^{2} + 0.1584 K_{t}^{3} . \end{matrix}

(31)

For

K_{t} \leq 0.5

c_{Lee} = - 0.2117 - 0.0513 K_{t} + 1.2976 K_{t}^{2} - 3.3222 K_{t}^{3}

(32)

or

K_{t} > 0.5

c_{Lee} = 0.7221 - 10.2801 K_{t} + 30.3285 K_{t}^{2} - 27.9766 K_{t}^{3} .

(33)

The evaluation consists of comparing the localised, clustered and regional models against the three baseline models: DISC, Dirint and Lee. The DISC and Dirint models were selected based on their performance in estimating DNI for Southern African climates. The Lee and Dirint models have foundational similarities to the DISC model. These models consider whether the newly developed decomposition model improves the accuracy of hourly DNI estimations for Southern Africa. The accuracy evaluation uses the comparison metrics discussed in the next section.

2.6. Decomposition Model Development Methodology

The methodology builds on the DISC model. The DISC model stands out as one of the better-performing models for estimating DNI for South Africa [38]. Its simplicity is evident in its lack of need for a complex four-dimensional lookup table, unlike the Dirint model.

The original DISC model uses Equation (21), an exponential function. However, the regression model for an exponential function, as discussed in Section 2.3, showed difficulty in finding optimal a, b and c coefficients in all cases. Instead, a second-order polynomial function of

A M

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

(34)

is a suitable substitute with similar regression results.

The training set then fits a, b and c for intervals

K_{t} \leq 0.60

and

K_{t} > 0.60

:

\begin{matrix} a & = a_{0} + a_{1} K_{t} + a_{2} K_{t}^{2} + a_{3} K_{t}^{3}, \\ b & = b_{0} + b_{1} K_{t} + b_{2} K_{t}^{2} + b_{3} K_{t}^{3}, \\ c & = c_{0} + c_{1} K_{t} + c_{2} K_{t}^{2} + c_{3} K_{t}^{3}, \end{matrix}

(35)

and the validation and testing sets evaluate the model’s accuracy.

Each model development undergoes the following initial processing steps:

Empirical formulae estimate $θ_{Z}$ , $A M$ , pressure, $I_{0 n}$ , $K_{t}$ and $K_{n}$ . From this, the assessment of available models aids in developing a new model;
Data is split into intervals of 0.05 $K_{t}$ , starting from 0.175 to 0.875;
$Δ K_{n}$ is then modelled as Equation (34);
The interval or intervals are then fitted against the function to determine Equation (34) to determine the a, b and c coefficients using a least squares regression analysis;
From the $K_{t}$ -interval function, the $a_{0}$ - $a_{3}$ , $b_{0}$ - $b_{3}$ and $c_{0}$ - $c_{3}$ coefficients are fitted to a polynomial of Equation (35) with regards to $K_{t}$ ;
These equations can be used to determine $Δ K_{n}$ and $K_{n}$ , which, in turn, calculates the DNI (see Equations (19) and (20)).

For each SAURAN station, a localised decomposition model is developed. A clustered decomposition model describes an area with similar irradiance patterns using the clustered areas discussed in [40]. Farmer and Rix first presented a two-cluster correlation map using the SAURAN database [48] and, by using this approach, this study formulated four clusters instead of two in Southern Africa, as shown in Figure 3a.

Figure 3a shows the clusters’ geographical location, and Figure 3b shows the penetration levels of GHI. Table 4 shows the different clusters’ training sets’ mean GHI, DNI and DHI.

Figure 3. Clusters within the Southern African context.

Cluster 1 receives the most GHI and DNI, and Cluster 3 receives the least, as evident from Figure fig:Clustersb. The different climates are also evident in these clusters: Cluster 3 is more humid and receives, on average, more DHI than Cluster 1.

Figure 4 shows how the cluster data is combined. Each cluster and the regional (Southern African) model are combined with even distributions of datasets to avoid introducing a bias, as some stations are over-represented in the original data set. Some stations, such as the SUN, UPR and RVD stations, have considerably more data available as they are either older stations or have not been closed down.

The different stations have varying climates, and therefore, a larger representation of one station will result in a biased model towards that station. The advantage of the even distribution is that every station is sufficiently represented and will not cause a model bias, but this reduces the amount of available data.

Cluster 2’s stations have higher elevation and summer humidity due to its warm, rainy summers and dry, cold winters. The expected annual irradiance levels are lower, as seen in Figure 3b. The stations have higher humidity because of their location and higher DHI levels.

The two stations in Cluster 2, UPR and CSIR, are expected to have more diffuse particles due to the higher air pollution levels and, therefore, higher DHI levels. Cluster 2 has a large bias of the data from Pretoria, South Africa, from the CSIR and UPR datasets.

Cluster 4 has lower annual irradiance levels, as seen in Figure 3b, and FRH and NMU are closer to the coastline, whereas GRT is inland.

3. Development of New Decomposition Models

This section discusses the newly developed a, b and c coefficients of Equation (34).

The section consists of three subsections:

The localised decomposition models, developed using the training dataset of the SAURAN station;
The clustered decomposition models, which are modelled on the training data of all the stations within the cluster, as discussed in Figure 4;
And the regional model is modelled on all the stations’ training data (Table 3).

3.1. Localised Decomposition Models

The localised decomposition model equations for the a, b and c coefficients are presented in Appendix A.

3.2. Cluster Decomposition Models

Figure 5, Figure 6, Figure 7 and Figure 8 show the different corresponding clusters’ model coefficients.

3.2.1. Cluster 1

Cluster 1 comprises the HLO, NUST, RVD, SUN and VAN datasets, as shown in Figure 3a and Figure 4.

Figure 5 shows the Cluster 1 and five stations’ a, b and c coefficients. The discussion of the different stations is in Appendix A under Subsections A.5 (HLO), A.11 (NUST), A.12 (RVD), A.13 (SUN) and A.19 (VAN).

The RVD model is the only model showing difficulty fitting the coefficients with

K_{t}

. Table 3 indicates that the RVD station has the highest mean DNI and GHI, with the lowest DHI measurements, compared to the rest of Cluster 1’s stations.

Figure 5. Cluster 1 coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

Figure 5. Cluster 1 coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

The coefficients for Cluster 1 are

\begin{matrix} a = \{\begin{matrix} 2.4134 - 15.428 K_{t} + 50.7433 K_{t}^{2} - 50.3864 K_{t}^{3} for K_{t} < 0.60, \\ 18.4363 - 61.7241 K_{t} + 67.5365 K_{t}^{2} - 23.9963 K_{t}^{3} for K_{t} \geq 0.60 \end{matrix} \\ b = \{\begin{matrix} - 1.4538 + 13.4628 K_{t} - 43.4278 K_{t}^{2} + 40.7081 K_{t}^{3} for K_{t} < 0.60, \\ - 7.7071 + 23.0993 K_{t} - 20.5561 K_{t}^{2} + 4.7883 K_{t}^{3} for K_{t} \geq 0.60 \end{matrix} \\ c = \{\begin{matrix} 0.2232 - 2.1593 K_{t} + 6.6964 K_{t}^{2} - 6.0805 K_{t}^{3} for K_{t} < 0.60, \\ 0.7064 - 1.9679 K_{t} + 1.5214 K_{t}^{2} - 0.2153 K_{t}^{3} for K_{t} \geq 0.60 \end{matrix} \end{matrix}

(36)

3.2.2. Cluster 2

Cluster 2 consists of the CSIR, CUT, UBG, UFS, UPR, and UNV datasets. Figure 6 shows the Cluster 2 and six stations’ a, b and c coefficients.

The discussion of the different stations are in Appendix A under Subsections A.1 (CSIR), A.2 (CUT), A.14 (UBG), A.15 (UFS), A.18 (UPR) and A.16 (UNV). The UFS have the greatest deviation from the Cluster 2 fit.

Figure 6. Cluster 2 coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

Figure 6. Cluster 2 coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

The coefficients for Cluster 2 are

\begin{matrix} a = \{\begin{matrix} 1.4234 - 6.6647 K_{t} + 25.5673 K_{t}^{2} - 27.8939 K_{t}^{3} for K_{t} < 0.60, \\ 10.5939 - 28.4856 K_{t} + 21.7611 K_{t}^{2} - 3.348 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.8033 + 7.723 K_{t} - 26.9074 K_{t}^{2} + 25.9995 K_{t}^{3} for K_{t} < 0.60, \\ - 6.1286 + 15.6674 K_{t} - 9.4832 K_{t}^{2} - 0.4949 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.1604 - 1.5968 K_{t} + 5.0219 K_{t}^{2} - 4.537 K_{t}^{3} for K_{t} < 0.60, \\ 0.8966 - 2.6121 K_{t} + 2.2392 K_{t}^{2} - 0.4802 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(37)

3.2.3. Cluster 3

Cluster 3 consists of the KZH, KZW and UNZ datasets. Figure 7 shows the Cluster 3 and three stations’ a, b and c coefficients.

The discussion of the different stations is in Appendix A under Subsections A.7 (KZH), A.8 (KZW) and A.17 (UNZ). The three models fit quite well and are similar to Cluster 3.

Figure 7. Cluster 3 coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

Figure 7. Cluster 3 coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

The coefficients for Cluster 3 are

\begin{matrix} a = \{\begin{matrix} 1.7678 - 9.8995 K_{t} + 34.9076 K_{t}^{2} - 36.2495 K_{t}^{3} for K_{t} < 0.60, \\ 41.1735 - 157.5062 K_{t} + 201.1335 K_{t}^{2} - 85.5391 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 1.0914 + 10.2302 K_{t} - 33.9924 K_{t}^{2} + 32.4023 K_{t}^{3} for K_{t} < 0.60, \\ - 31.151 + 121.964 K_{t} - 158.1413 K_{t}^{2} + 67.9737 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.2256 - 2.1961 K_{t} + 6.8092 K_{t}^{2} - 6.1997 K_{t}^{3} for K_{t} < 0.60, \\ 3.4215 - 13.4513 K_{t} + 17.5277 K_{t}^{2} - 7.5726 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(38)

3.2.4. Cluster 4

Cluster 4 consists of the NMU, FRH and GRT datasets. Figure 8 shows the Cluster 4 and three stations’ a, b and c coefficients.

The discussion of the different stations is in Appendix Avvvv under Subsections A.10 (NMU), A.3 (FRH) and A.4 (GRT). The GRT station’s c-coefficient does show difficulty in a fit determination.

Figure 8. Cluster 4 coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

Figure 8. Cluster 4 coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

The coefficients for Cluster 4 are

\begin{matrix} a = \{\begin{matrix} 0.7671 - 0.9387 K_{t} + 9.7073 K_{t}^{2} - 14.2827 K_{t}^{3} for K_{t} < 0.60, \\ 20.0495 - 67.0086 K_{t} + 73.6919 K_{t}^{2} - 26.4669 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.2382 + 2.3908 K_{t} - 11.3676 K_{t}^{2} + 12.3585 K_{t}^{3} for K_{t} < 0.60, \\ - 12.5095 + 42.6536 K_{t} - 47.1486 K_{t}^{2} + 16.8022 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.0451 - 0.5012 K_{t} + 1.8485 K_{t}^{2} - 1.7706 K_{t}^{3} for K_{t} < 0.60, \\ 1.4976 - 5.1597 K_{t} + 5.8075 K_{t}^{2} - 2.1264 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(39)

3.3. Regional Decomposition Model

The regional (Southern African) decomposition model data is an even distribution of the SAURAN stations regarding the number of data points used per station. Multiple climates, different elevations and pollution levels are represented within the dataset, leading to a better decomposition model for Southern Africa and a regional application.

Figure 9 shows the coefficients a, b and c of the regional model and the four clusters.

The coefficients for the regional model are

\begin{matrix} a = \{\begin{matrix} 1.2893 - 5.2531 K_{t} + 21.5081 K_{t}^{2} - 24.5156 K_{t}^{3} for K_{t} < 0.60, \\ 19.0295 - 63.9357 K_{t} + 70.7485 K_{t}^{2} - 25.6524 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.6327 + 5.891 K_{t} - 21.4431 K_{t}^{2} + 21.2593 K_{t}^{3} for K_{t} < 0.60, \\ - 11.7813 + 39.923 K_{t} - 43.6596 K_{t}^{2} + 15.3252 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.1118 - 1.1105 K_{t} + 3.6089 K_{t}^{2} - 3.3222 K_{t}^{3} for K_{t} < 0.60, \\ 1.49 - 5.2009 K_{t} + 5.9381 K_{t}^{2} - 2.2137 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(40)

Figure 9. Regional model coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

Figure 9. Regional model coefficients in

Δ K_{n} = a + b \cdot A M + c \cdot A M^{2}

.

4. Results

Each station is discussed individually by assessing the dataset’s comparison metrics: the

R^{2}

-value, MBE, RMSE and MAE, and the MAE of two

K_{t}

-intervals. The results compare the localised, clustered and regional (Southern African) models to the three baseline models, DISC, Dirint and Lee. The tables visualise the results for each station using red and green, with green denoting lower error and red denoting higher error.

Table 3 discusses the validation data. In the previous section, the localised, clustered, and regional models were empirically determined. Appendix A expands on the equations for the localised models.

Section 3.2 and Section 3.3 discussed the clustered and regional models. The test data also introduces two unknown datasets, the ILA and MIN datasets. These datasets assess the models with new data for the developed models. ILA and MIN have no localised model, but geographically, they fall within a cluster: ILA falls under Cluster 1 and MIN under Cluster 2.

4.1. Testing and Validation Results

4.1.1. CSIR

Section A.1 shows the decomposition model equations for the CSIR station. Table 5 shows the results of the CSIR station. The results show that the localised, Cluster 2 and regional models outperform the baseline models in all metrics. The localised model significantly improves for lower

K_{t}

, reducing the MAE from around 60% to 36%.

The test results of the CSIR dataset are presented in Figure A1. As seen in the figure, the localised, cluster, and regional models outperform the baseline models, consistent with the validation results of the previous section in Table 5.

4.1.2. CUT

Section A.2 shows the decomposition model equations for the CUT station. Table 6 shows the CUT station results. The localised Cluster 2 and regional model significantly improve the comparison metrics over the three baseline models. The Lee model has a similar MBE to the regional model (±0.7) and has a higher

K_{t}

-metric similar to Cluster 2. However, the Lee RMSE and MAE still do not outperform the new models.

Figure A2 presents the test results of the CUT dataset, where the localised, cluster and regional models outperform the baseline models. The test results are consistent with the validation results presented in Table 6.

4.1.3. FRH

Section A.3 shows the decomposition model equations for the FRH station. Table 7 shows the results of the FRH station. The localised model outperforms the baseline models by improving

R^{2}

and MBE and reducing MAE and RSME. The Lee model shows the lowest MAE for higher

K_{t}

-values; however, it does show an overestimation for DNI with a higher MBE. For most metrics, the localised, Cluster 4 and regional model outperforms the baseline models.

Figure A3 presents the test results of the FRH dataset. The localised Cluster 2 and regional models outperform the baselines, but no significant difference exists between the new three models. The test results presented in Figure A3 correspond with the validation results in Table 7.

4.1.4. GRT

Section A.4 shows the decomposition model equations for the GRT station. Table 8 shows the GRT station results. The localised model does show improvement over the DISC and Dirint model but does not significantly outperform the Lee model. The Lee model has a higher

R^{2}

, lower MBE AND RMSE, whereas the localised model has a lower MAE for the entire dataset and the two

K_{t}

intervals. The Cluster 4 and regional models perform better than the DISC and Dirint models but do not significantly outperform all the baseline models.

Figure A4 shows the test results of the GRT dataset. The results correspond with the validation results in Table 8. The localised, Cluster 4 and regional model does outperform the DISC and Dirint model but only marginally outperforms the Lee model.

4.1.5. HLO

Section A.5 shows the decomposition model equations for the HLO station. Table 9 shows the HLO station results. The localised model performs better than the baseline models and improves all comparison metrics.

Figure A5 shows the test results of the HLO dataset. The validation results in Table 9 and the test results correspond well, indicating that the localised Cluster 1 and regional models outperform the baseline models.

4.1.6. ILA

Figure A6 presents the test results of the ILA dataset. The ILA dataset has no localised decomposition model; therefore, the testing only assesses the Cluster 1 and regional models. The results show that the Cluster 1 and regional models outperform the baseline models. The results highlight the substitution of using a Cluster model when no localised model is available, subject to the geographical location within the Cluster area.

4.1.7. KZH

Section A.7 shows the decomposition model equations for the KZH station. Table 10 shows the results of the KZH station. The localised, Cluster 3 and regional models all show significant improvements in reducing the error over the baseline models. The DISC has a lower MBE than the regional model.

Figure A7 shows the test results of the KZH dataset. The localised, Cluster 3 and regional models all outperform the baseline models. The regional model does not outperform Cluster 3 or the localised model significantly.

4.1.8. KZW

Section A.8 shows the decomposition model equations for the KZW station. Table 11 shows the results of the KZW station. The localised, clustered, and regional models show improvement over the baseline models with metrics that assess the entire data set.

Figure A8 shows the test results of the KZW dataset. The validation and testing results from Table 11 correspond.

MIN

The test results of the MIN dataset are presented in Figure A9. MIN has no localised decomposition model and falls geographically under Cluster 2. The cluster model and localised model show improvement over the baseline models. Much like the ILA dataset, the MIN dataset demonstrates how the clustered and regional models can serve as alternatives to enhance DNI estimations in Southern Africa.

Ň Section A.10 shows the decomposition model equations for the NMU station. Table 12 shows the NMU station results. The localised, Cluster 4 and regional models show significant improvement in reducing the errors from the baseline models. Based on the higher MBE, the Cluster 4 and regional models overestimate the DNI more than the DISC and Dirint models.

The test results of the NMU dataset are presented in Figure A10. Localised and cluster models outperform baseline models, consistent with the results in Table 12.

4.1.9. NUST

Section A.11 shows the decomposition model equations for the NUST station. Table 13 shows the results of the NUST station. The localised model shows superior performance over the baseline models, as well as the clustered and regional models. The metrics of the clustered model compared to the baselines indicate that the regional model slightly overestimates the DNI compared to the lowest baseline model (Lee), which slightly underestimates the DNI.

The test results of the NUST dataset are presented in Figure A11. Localised, clustered, and regional models outperform the baseline models, consistent with the validation results presented in Table 13. The regional model shows marginal underperformance compared to the localised and Cluster 1 model, but not significant enough to warrant it as unusable.

4.1.10. RVD

Section A.12 shows the decomposition model equations for the RVD station. Table 14 shows the RVD station results. The localised, clustered, and regional models outperform the baseline models. The Lee model performs better than the regional model but does not outperform the localised and cluster models. The RVD station receives more irradiance on average than other stations in the SAURAN database.

Figure A12 shows the test results of the RVD dataset. The results indicate that the localised, cluster and regional models outperform the baseline models, which is consistent with the validation results of the previous section in Table 14. The localised model’s RMSE is higher than the Lee model; however, the localised model does best in reducing the error for the other metrics. Though the regional model outperforms the baseline models, it does show the worst performance of the three newly developed models for RVD.

4.1.11. SUN

Section A.13 shows the decomposition model equations for the SUN station. Table 15 shows the SUN station results. The localised model outperforms the baseline models by improving

R^{2}

and reducing the MBE, RMSE and MAE. The Cluster 1 and regional models show a slightly worse MBE than the Lee baseline model but otherwise outperform the baseline models. The Lee model also predicts higher

K_{t}

points with a lower MAE than the regional model; however, the other metrics indicate that the regional model shows better results overall.

Figure A13 shows the test results of the SUN dataset. The results indicate that the localised, cluster and regional models outperform the baseline models, which is consistent with the testing results of the previous section in Table 15. As with the validation results, the regional model is the worst-performing new model but still outperforms the baseline models.

4.1.12. UBG

Section A.14 shows the decomposition model equations for the UBG station. Table 16 shows the UBG station results. The localised, clustered and regional models all outperform the baseline models. The Lee model has a lower MBE than the Cluster 1 and regional models. The Lee model also has a lower MAE for

K_{t} \geq 0.60

; however, the other metrics indicate that the model does not improve the

R^{2}

, RMSE, overall MAE and

K_{t} < 0.60

MAE.

Figure A14 shows the test results of the UBG dataset. The test results show that the localised, Cluster 1 and regional models outperform the baseline models. The Lee model has a lower MBE than the regional model, consistent with the validation results in Table 16.

4.1.13. UFS

Section A.15 shows the decomposition model equations for the UFS station. Table 17 shows the UFS station results. The localised, Cluster 2 and regional models outperform the baseline models. The Lee model underestimates the DNI slightly better than the Cluster 2 model.

Figure A15 shows the test results of the UFS dataset. All three new decomposition models significantly improve the errors compared to the baseline models, consistent with the validation results in Table 17.

4.1.14. UNV

Section A.16 shows the decomposition model equations for the UNV station. Table 18 shows the UNV station results. The localised, Cluster 2 and regional models significantly improved over the baseline models. The Cluster 2 and regional model overestimates the DNI more than the DISC model, based on the MBE.

Figure A16 shows the test results of the UNV dataset. The test results correspond with the validation results in Table 18, where the localised, Cluster 2 and regional models outperform the baseline models. The only exception is the MBE, where the Cluster 2 and regional models perform worse than the DISC model. Considering all the metrics, the new models outperform the baselines in reducing the overall error of DNI estimations.

4.1.15. UNZ

Section A.17 shows the decomposition model equations for the UNZ station. Table 19 shows the results of the UNZ station. The localised, clustered and regional models all show improvement over the baselines. The Dirint model has a lower MBE than the regional model.

Figure A17 shows the test results of the UNZ dataset, which correspond with the validation results in Table 19.

4.1.16. UPR

Section A.18 shows the decomposition model equations for the UPR station. Table 20 shows the UPR station results. The localised, cluster and regional models outperform the baseline models.

Figure A18 shows the test results of the UPR dataset. The comparison metrics of the entire dataset indicate that the localised, cluster and regional models outperform the baseline models, which is consistent with the results of the validation dataset in Table 20.

4.1.17. VAN

Section A.19 shows the decomposition model equations for the VAN station. Table 21 shows the results of the VAN station. The localised model outperforms the baseline models by improving

R^{2}

and reducing the MBE, RMSE and MAE. The new models significantly reduce the MAE in the higher

K_{t}

compared to the baseline models. Even when outperforming the baseline models, the regional model performs the worst of the new model. The VAN station receives a very high average DNI and DHI and lower DHI than the rest of the database’s stations. These results are similar to the RVD station, which has significantly higher irradiance levels than the other stations.

Figure A19 shows the test results of the VAN dataset. The new models all outperform the baseline models. The regional model shows the worst performance of the new models, even when outperforming the baseline models, similar to the RVD station that receives more irradiance on average compared to the other stations. These results are consistent with the validation results in Table 21.

4.2. Discussion

Table 22 summarises the performance of the localised, clustered and regional models for both the test and validation sets.

As expected, the localised models outperformed the baseline models for all station datasets because of the site-specific climatic training data. As discussed in the previous section, the cluster model combines multiple stations in a similar geographical area.

The clustered model will have significantly more data from which to train a model. A clustered model is ideal if a site has no data for localised model development using the discussed methodology. The regional (Southern African) model also shows improvement over the baseline models, indicating that this model may be appropriate for adoption as a new model for Southern Africa. The two models with no localised model (ILA and MIN) showed improvement using the clustered and regional model in the validation study.

5. Conclusion

This article presented the development of a new decomposition model of hourly DNI estimations for Southern Africa. The new models improved the DISC model [17] in developing new decomposition models localised for Southern African climates. The new decomposition models improved the DNI estimation errors over the baselines for all validation and test sets.

The results indicate that a localised model will improve the estimations of DNI. The proposed methodology can be helpful for the development of local decomposition models for other areas worldwide.

Clustered models also indicate that grouping data based on similar geographical and climatic properties can also improve the performance of decomposition models. This phenomenon could be helpful when using a clustered decomposition model if no local model or limited data is available but from two or more geographically close stations.

The overall model, the regional decomposition model, is encapsulated by different climatic regions and geographical locations. There are also some exceptions where the model over- or underestimates the DNI; however, the overall metrics indicate that the Southern African model significantly improves over the baseline models.

The study validates hourly irradiance data for

K_{t}

-intervals between 0.175 and 0.875. Recommendations for future work include developing models for higher and lower

K_{t}

values and models for higher temporal resolutions with increased accuracy, which is ideal for real-time monitoring and short-term forecasting of PV power.

Developing countries with an accurate decomposition model can open the path to expanding the use of renewable energy sources and reducing their dependence on coal and fossil fuels. Good-quality data is needed to ensure the progress of solar energy research and development. The next step is assessing the decomposition models with well-known transposition models to determine improved accuracy for PR estimations.

Author Contributions

Conceptualisation, F.D.; methodology, F.D.; software, F.D.; validation, F.D.; formal analysis, F.D.; investigation, F.D.; resources, F.D. and A.R.; data curation, F.D.; writing—original draft preparation, F.D.; writing—review and editing, F.D. and A.R.; visualisation, F.D.; supervision, A.R; project administration, F.D. and A.R; funding acquisition, F.D.and A.R. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Localised Decomposition Models

Appendix A.1. CSIR

The a, b and c coefficients for Equation (34) for the CSIR station are

\begin{matrix} a & = \{\begin{matrix} 0.6472 - 1.0883 K_{t} + 12.3429 K_{t}^{2} - 17.7027 K_{t}^{3} for K_{t} < 0.60, \\ 6.1444 - 9.484 K_{t} - 5.138 K_{t}^{2} + 9.2371 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b & = \{\begin{matrix} - 0.5032 + 5.67 K_{t} - 22.0707 K_{t}^{2} + 22.2654 K_{t}^{3} for K_{t} < 0.60, \\ 2.8669 - 22.2584 K_{t} + 43.4833 K_{t}^{2} - 24.9682 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c & = \{\begin{matrix} 0.1418 - 1.4904 K_{t} + 4.7805 K_{t}^{2} - 4.3489 K_{t}^{3} for K_{t} < 0.60, \\ - 1.1287 + 5.8459 K_{t} - 9.4568 K_{t}^{2} + 4.8729 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A1)

The test results of the CSIR dataset are in Figure A1.

Figure A1. Hourly test results of decomposition model development for CSIR.

Appendix A.2. CUT

The a, b and c coefficients for Equation (34) for the CUT station are

\begin{matrix} a & = \{\begin{matrix} 2.4122 - 15.382 K_{t} + 48.3773 K_{t}^{2} - 45.8766 K_{t}^{3} for K_{t} < 0.60, \\ - 15.7689 + 86.1878 K_{t} - 141.6902 K_{t}^{2} + 73.2142 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b & = \{\begin{matrix} - 1.6214 + 14.8897 K_{t} - 45.5364 K_{t}^{2} + 40.6731 K_{t}^{3} for K_{t} < 0.60, \\ 17.2881 - 85.5492 K_{t} + 134.0146 K_{t}^{2} - 67.4052 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c & = \{\begin{matrix} 0.2864 - 2.6906 K_{t} + 7.8329 K_{t}^{2} - 6.7318 K_{t}^{3} for K_{t} < 0.60, \\ - 1.9147 + 9.6635 K_{t} - 15.3053 K_{t}^{2} + 7.7563 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A2)

The validation results of the CUT dataset are in Figure A2.

Figure A2. Hourly test results of decomposition model development for CUT.

Appendix A.3. FRH

The a, b and c coefficients for Equation (34) for the FRH station are

\begin{matrix} a & = \{\begin{matrix} 1.3199 - 6.0578 K_{t} + 24.6953 K_{t}^{2} - 27.3825 K_{t}^{3} for K_{t} < 0.60, \\ 33.3782 - 124.5221 K_{t} + 155.5513 K_{t}^{2} - 64.7203 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b & = \{\begin{matrix} - 0.5658 + 5.6247 K_{t} - 21.3334 K_{t}^{2} + 21.2952 K_{t}^{3} for K_{t} < 0.60, \\ - 26.2959 + 102.0318 K_{t} - 131.6271 K_{t}^{2} + 56.3592 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c & = \{\begin{matrix} 0.0963 - 1.024 K_{t} + 3.5042 K_{t}^{2} - 3.2887 K_{t}^{3} for K_{t} < 0.60, \\ 3.9117 - 15.5175 K_{t} + 20.4822 K_{t}^{2} - 8.9754 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A3)

The test results of the FRH dataset are in Figure A3.

Figure A3. Hourly test results of decomposition model development for FRH.

Appendix A.4. GRT

The a, b and c coefficients for Equation (34) for the GRT station are

\begin{matrix} a = \{\begin{matrix} 2.9602 - 19.6392 K_{t} + 58.9738 K_{t}^{2} - 55.4505 K_{t}^{3} for K_{t} < 0.60, \\ - 29.0697 + 138.0498 K_{t} - 209.2486 K_{t}^{2} + 102.455 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 2.2843 + 20.1509 K_{t} - 58.384 K_{t}^{2} + 51.362 K_{t}^{3} for K_{t} < 0.60, \\ 34.8693 - 155.9554 K_{t} + 227.6821 K_{t}^{2} - 108.7518 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.4656 - 4.1668 K_{t} + 11.5129 K_{t}^{2} - 9.7101 K_{t}^{3} for K_{t} < 0.60, \\ - 8.4664 + 36.6029 K_{t} - 51.9382 K_{t}^{2} + 24.2327 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A4)

Figure A4 shows the test results of the GRT dataset.

Figure A4. Hourly test results of decomposition model development for GRT.

Appendix A.5. HLO

The a, b and c coefficients for Equation (34) for the HLO station are

\begin{matrix} a = \{\begin{matrix} 3.1156 - 21.0151 K_{t} + 62.5312 K_{t}^{2} - 57.2369 K_{t}^{3} for K_{t} < 0.60, \\ 50.8058 - 194.5525 K_{t} + 248.2624 K_{t}^{2} - 105.3977 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 2.0533 + 18.2104 K_{t} - 53.4859 K_{t}^{2} + 46.7754 K_{t}^{3} for K_{t} < 0.60, \\ - 33.1822 + 128.4612 K_{t} - 164.8983 K_{t}^{2} + 70.2185 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.3508 - 3.2108 K_{t} + 9.1319 K_{t}^{2} - 7.7849 K_{t}^{3} for K_{t} < 0.60, \\ 3.7247 - 14.4408 K_{t} + 18.5853 K_{t}^{2} - 7.9382 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A5)

Figure A5 shows the test results of the HLO dataset.

Figure A5. Hourly test results of decomposition model development for HLO.

Appendix A.6. ILA

There is no decomposition model developed for the ILA station.

The test results of the ILA dataset are in Figure A6.

Figure A6. Hourly test results of decomposition model development for ILA.

Appendix A.7. KZH

The a, b and c coefficients for Equation (34) for the KZH station are

\begin{matrix} a = \{\begin{matrix} 1.3444 - 6.1333 K_{t} + 23.7139 K_{t}^{2} - 25.5604 K_{t}^{3} for K_{t} < 0.60, \\ 56.4073 - 216.3047 K_{t} + 276.4264 K_{t}^{2} - 117.5281 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.7156 + 6.8854 K_{t} - 24.0569 K_{t}^{2} + 22.9604 K_{t}^{3} for K_{t} < 0.60, \\ - 45.6353 + 178.5713 K_{t} - 231.593 K_{t}^{2} + 99.6179 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.1646 - 1.6569 K_{t} + 5.2451 K_{t}^{2} - 4.7368 K_{t}^{3} for K_{t} < 0.60, \\ 6.2156 - 24.5162 K_{t} + 32.0741 K_{t}^{2} - 13.9205 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A6)

The test results of the KZH dataset are in Figure A7.

Figure A7. Hourly test results of decomposition model development for KZH.

Appendix A.8. KZW

The a, b and c coefficients for Equation (34) for the KZW station are

\begin{matrix} a = \{\begin{matrix} 2.2627 - 14.4806 K_{t} + 47.1629 K_{t}^{2} - 46.2386 K_{t}^{3} for K_{t} < 0.60, \\ 35.0603 - 132.558 K_{t} + 167.2728 K_{t}^{2} - 70.2359 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 1.4332 + 13.4317 K_{t} - 42.4822 K_{t}^{2} + 39.1838 K_{t}^{3} for K_{t} < 0.60, \\ - 28.1708 + 109.9004 K_{t} - 141.9142 K_{t}^{2} + 60.6943 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.2678 - 2.6116 K_{t} + 7.9363 K_{t}^{2} - 7.1072 K_{t}^{3} for K_{t} < 0.60, \\ 3.1759 - 12.4602 K_{t} + 16.1949 K_{t}^{2} - 6.9726 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A7)

Figure A8 shows the test results of the KZW dataset.

Figure A8. Hourly test results of decomposition model development for KZW.

Appendix A.9. MIN

There is no decomposition model developed for the MIN station.

The test results of the MIN dataset are in Figure A9.

Figure A9. Hourly test results of decomposition model development for MIN.

Appendix A.10. NMU

The a, b and c coefficients for Equation (34) for the NMU station are

\begin{matrix} a = \{\begin{matrix} 0.0688 + 5.2168 K_{t} - 7.122 K_{t}^{2} + 0.2163 K_{t}^{3} for K_{t} < 0.60, \\ 15.7683 - 48.8606 K_{t} + 48.5956 K_{t}^{2} - 15.0438 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} 0.3483 - 2.9436 K_{t} + 3.2778 K_{t}^{2} - 0.0145 K_{t}^{3} for K_{t} < 0.60, \\ - 11.0783 + 37.046 K_{t} - 39.8763 K_{t}^{2} + 13.6687 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} - 0.0758 + 0.598 K_{t} - 1.1265 K_{t}^{2} + 0.7025 K_{t}^{3} for K_{t} < 0.60, \\ 1.9782 - 7.2272 K_{t} + 8.7285 K_{t}^{2} - 3.4834 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A8)

The test results of the NMU dataset are in Figure A10.

Figure A10. Hourly test results of decomposition model development for NMU.

Appendix A.11. NUST

The a, b and c coefficients for Equation (34) for the NUST station are

\begin{matrix} a & = \{\begin{matrix} 2.0021 - 11.1478 K_{t} + 36.3216 K_{t}^{2} - 35.8544 K_{t}^{3} for K_{t} < 0.60, \\ - 17.9917 + 91.7087 K_{t} - 144.5222 K_{t}^{2} + 72.3519 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b & = \{\begin{matrix} - 1.2658 + 11.3567 K_{t} - 35.6964 K_{t}^{2} + 32.4545 K_{t}^{3} for K_{t} < 0.60, \\ 24.1237 - 111.1616 K_{t} + 165.4905 K_{t}^{2} - 79.997 K_{t}^{3} for K_{t} \geq 0.60, \end{matrix} \\ c & = \{\begin{matrix} 0.2056 - 1.9277 K_{t} + 5.7745 K_{t}^{2} - 5.0518 K_{t}^{3} for K_{t} < 0.60, \\ - 2.59 + 11.8847 K_{t} - 17.6199 K_{t}^{2} + 8.4894 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A9)

The test results of the NUST dataset are in Figure A11.

Figure A11. Hourly test results of decomposition model development for NUST.

Appendix A.12. RVD

The a, b and c coefficients for Equation (34) for the RVD station are

\begin{matrix} a & = \{\begin{matrix} 4.0191 - 28.309 K_{t} + 83.0024 K_{t}^{2} - 74.0684 K_{t}^{3} for K_{t} < 0.60, \\ 84.2546 - 327.2114 K_{t} + 422.3587 K_{t}^{2} - 181.2345 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b & = \{\begin{matrix} - 2.8895 + 25.4823 K_{t} - 74.043 K_{t}^{2} + 63.2378 K_{t}^{3} for K_{t} < 0.60, \\ - 65.5943 + 256.2577 K_{t} - 331.9301 K_{t}^{2} + 142.7166 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c & = \{\begin{matrix} 0.5456 - 4.9199 K_{t} + 13.8139 K_{t}^{2} - 11.415 K_{t}^{3} for K_{t} < 0.60, \\ 10.4557 - 41.161 K_{t} + 53.7847 K_{t}^{2} - 23.3378 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A10)

The test results of the RVD dataset are in Figure A12.

Figure A12. Hourly test results of decomposition model development for RVD.

Appendix A.13. SUN

The a, b and c coefficients for Equation (34) for the SUN station are

\begin{matrix} a = \{\begin{matrix} 1.4996 - 7.646 K_{t} + 30.0235 K_{t}^{2} - 33.6216 K_{t}^{3} for K_{t} < 0.60, \\ 32.6225 - 122.3879 K_{t} + 152.4023 K_{t}^{2} - 62.9667 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.835 + 7.9823 K_{t} - 28.5353 K_{t}^{2} + 28.4965 K_{t}^{3} for K_{t} < 0.60, \\ - 19.4403 + 73.2046 K_{t} - 90.6277 K_{t}^{2} + 36.9662 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.1285 - 1.3014 K_{t} + 4.3214 K_{t}^{2} - 4.0834 K_{t}^{3} for K_{t} < 0.60, \\ 2.0866 - 7.7979 K_{t} + 9.595 K_{t}^{2} - 3.8902 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A11)

The test results of the SUN dataset are in Figure A13.

Figure A13. Hourly test results of decomposition model development for SUN.

Appendix A.14. UBG

The a, b and c coefficients for Equation (34) for the UBG station are

\begin{matrix} a = \{\begin{matrix} 0.5601 + 1.4348 K_{t} + 1.9992 K_{t}^{2} - 6.4178 K_{t}^{3} for K_{t} < 0.60, \\ 42.1623 - 152.8269 K_{t} + 183.6564 K_{t}^{2} - 73.01 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.0378 + 0.6077 K_{t} - 6.2716 K_{t}^{2} + 7.0084 K_{t}^{3} for K_{t} < 0.60, \\ - 44.4913 + 167.9403 K_{t} - 209.5344 K_{t}^{2} + 86.5114 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.0159 - 0.2605 K_{t} + 1.1245 K_{t}^{2} - 0.8877 K_{t}^{3} for K_{t} < 0.60, \\ 9.2841 - 35.956 K_{t} + 46.1639 K_{t}^{2} - 19.6586 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A12)

The test results of the SUN dataset are in Figure A14.

Figure A14. Hourly test results of decomposition model development for UBG.

Appendix A.15. UFS

The a, b and c coefficients for Equation (34) for the UFS station are

\begin{matrix} a = \{\begin{matrix} 1.1152 - 5.4355 K_{t} + 27.3687 K_{t}^{2} - 32.6276 K_{t}^{3} for K_{t} < 0.60, \\ 18.8962 - 58.0528 K_{t} + 56.4791 K_{t}^{2} - 16.8735 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.5439 + 6.5875 K_{t} - 27.7113 K_{t}^{2} + 28.8812 K_{t}^{3} for K_{t} < 0.60, \\ - 19.1711 + 65.1123 K_{t} - 71.8324 K_{t}^{2} + 25.6916 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.1395 - 1.6147 K_{t} + 5.7356 K_{t}^{2} - 5.4904 K_{t}^{3} for K_{t} < 0.60, \\ 5.238 - 19.7369 K_{t} + 24.6715 K_{t}^{2} - 10.2415 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A13)

The test results of the SUN dataset are in Figure A15.

Figure A15. Hourly test results of decomposition model development for UFS.

Appendix A.16. UNV

The a, b and c coefficients for Equation (34) for the UNV station are

\begin{matrix} a = \{\begin{matrix} 1.6679 - 8.8496 K_{t} + 30.8258 K_{t}^{2} - 31.7801 K_{t}^{3} for K_{t} < 0.60, \\ 17.0947 - 58.7362 K_{t} + 67.3532 K_{t}^{2} - 25.6036 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.9329 + 8.8795 K_{t} - 29.6546 K_{t}^{2} + 28.1554 K_{t}^{3} for K_{t} < 0.60, \\ - 11.0559 + 39.0859 K_{t} - 45.1859 K_{t}^{2} + 17.0921 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.1744 - 1.744 K_{t} + 5.4271 K_{t}^{2} - 4.9023 K_{t}^{3} for K_{t} < 0.60, \\ 0.7738 - 2.5106 K_{t} + 2.5888 K_{t}^{2} - 0.8303 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A14)

The test results of the UNV dataset are in Figure A16.

Figure A16. Hourly test results of decomposition model development for UNV.

Appendix A.17. UNZ

The a, b and c coefficients for Equation (34) for the UNZ station are

\begin{matrix} a = \{\begin{matrix} 1.1129 - 5.2859 K_{t} + 25.5284 K_{t}^{2} - 30.5998 K_{t}^{3} for K_{t} < 0.60, \\ 33.4179 - 127.1391 K_{t} + 161.8637 K_{t}^{2} - 68.7679 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.6222 + 7.0597 K_{t} - 28.1112 K_{t}^{2} + 29.4187 K_{t}^{3} for K_{t} < 0.60, \\ - 24.1838 + 94.1721 K_{t} - 121.4204 K_{t}^{2} + 51.9098 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.1398 - 1.5931 K_{t} + 5.5902 K_{t}^{2} - 5.4749 K_{t}^{3} for K_{t} < 0.60, \\ 2.3838 - 9.2877 K_{t} + 11.9991 K_{t}^{2} - 5.1432 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A15)

The test results of the UNZ dataset are in Figure A17.

Figure A17. Hourly test results of decomposition model development for UNZ.

Appendix A.18. UPR

The a, b and c coefficients for Equation (34) for the UPR station are

\begin{matrix} a = \{\begin{matrix} 1.3766 - 6.4439 K_{t} + 25.7243 K_{t}^{2} - 28.9372 K_{t}^{3} for K_{t} < 0.60, \\ - 2.5908 + 24.5466 K_{t} - 49.3217 K_{t}^{2} + 28.3294 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b = \{\begin{matrix} - 0.8308 + 8.2061 K_{t} - 29.0534 K_{t}^{2} + 28.667 K_{t}^{3} for K_{t} < 0.60, \\ 9.0641 - 45.9866 K_{t} + 73.7132 K_{t}^{2} - 37.7903 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c = \{\begin{matrix} 0.1626 - 1.6496 K_{t} + 5.2698 K_{t}^{2} - 4.8501 K_{t}^{3} for K_{t} < 0.60, \\ - 1.7989 + 8.4045 K_{t} - 12.7021 K_{t}^{2} + 6.2414 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A16)

The test results of the UPR dataset are in Figure A18.

Figure A18. Hourly test results of decomposition model development for UPR.

Appendix A.19. VAN

The a, b and c coefficients for Equation (34) for the VAN station are

\begin{matrix} a & = \{\begin{matrix} 1.8649 - 12.2743 K_{t} + 47.2823 K_{t}^{2} - 50.5904 K_{t}^{3} for K_{t} < 0.60, \\ 7.6031 - 11.9134 K_{t} - 6.9687 K_{t}^{2} + 12.3734 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ b & = \{\begin{matrix} - 1.1265 + 12.1632 K_{t} - 44.5245 K_{t}^{2} + 44.4602 K_{t}^{3} for K_{t} < 0.60, \\ 3.6296 - 27.7921 K_{t} + 54.161 K_{t}^{2} - 31.162 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \\ c & = \{\begin{matrix} 0.2277 - 2.5008 K_{t} + 8.4732 K_{t}^{2} - 8.0774 K_{t}^{3} for K_{t} < 0.60, \\ - 1.3247 + 7.0528 K_{t} - 11.5818 K_{t}^{2} + 6.025 K_{t}^{3} for K_{t} \geq 0.60 . \end{matrix} \end{matrix}

(A17)

Figure A19 shows the test results of the VAN dataset.

Figure A19. Hourly test results of decomposition model development for VAN.

References

Masters, G. Renewable and Efficient Electric Power Systems, 2 ed.; John Wiley and Sons, Inc.: Hoboken, New Jersey, 2013; pp. 186–245,316–398.
Sengupta, M.; Habte, A.; Wilbert, S.; Gueymard, C.; Remund, J. Best Practices Handbook for the Collection and Use of Solar Resource Data for Solar Energy Applications: Third Edition 2021. [CrossRef]
Lee, H.; Kim, S.Y.; Yun, C. Comparison of Solar Radiation Models to Estimate Direct Normal Irradiance for Korea. Energies 2017, 10, 594. [CrossRef]
Bertrand, C.; Vanderveken, G.; Journée, M. Evaluation of decomposition models of various complexity to estimate the direct solar irradiance over Belgium. Renewable Energy 2015, 74, 618 – 626. [CrossRef]
Liu, B.Y.; Jordan, R.C. The interrelationship and characteristic distribution of direct, diffuse and total solar radiation. Solar Energy 1960, 4, 1 – 19. [CrossRef]
Yang, D.; Gueymard, C.A. Ensemble model output statistics for the separation of direct and diffuse components from 1-min global irradiance. Solar Energy 2020, 208, 591–603. [CrossRef]
Boland, J.; Huang, J.; Ridley, B. Decomposing global solar radiation into its direct and diffuse components. Renewable and Sustainable Energy Reviews 2013, 28, 749–756. [CrossRef]
Ridley, B.; Boland, J.; Lauret, P. Modelling of diffuse solar fraction with multiple predictors. Renewable Energy 2010, 35, 478–483. [CrossRef]
Soares, J.; Oliveira, A.P.; Božnar, M.Z.; Mlakar, P.; Escobedo, J.F.; Machado, A.J. Modeling hourly diffuse solar-radiation in the city of São Paulo using a neural-network technique. Applied Energy 2004, 79, 201–214. [CrossRef]
Talvitie, T.; Eldali, F.; Pinney, D. Predicting Solar Diffuse and Direct Components Using Deep Neural Networks. 2021 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT) 2021, pp. 1–5. [CrossRef]
Kalyanam, R.; Hoffmann, S. A Novel Approach to Enhance the Generalization Capability of the Hourly Solar Diffuse Horizontal Irradiance Models on Diverse Climates. Energies 2020, 13. [CrossRef]
Bessafi, M.; Oree, V.; Khoodaruth, A.; Chabriat, J.P. Impact of decomposition and kriging models on the solar irradiance downscaling accuracy in regions with complex topography. Renewable Energy 2020, 162, 1992–2003. [CrossRef]
Janjai, S.; Phaprom, P.; Wattan, R.; Masiri, I. Statistical models for estimating hourly diffuse solar radiation in different regions of Thailand. Proceedings of the International Conference on Energy and Sustainable Development: Issues and Strategies (ESD 2010), 2010, pp. 1–6. [CrossRef]
Orgill, J.; Hollands, K. Correlation equation for hourly diffuse radiation on a horizontal surface. Solar Energy 1977, 19, 357 – 359. [CrossRef]
Erbs, D.; Klein, S.; Duffie, J. Estimation of the diffuse radiation fraction for hourly, daily and monthly-average global radiation. Solar Energy 1982, 28, 293 – 302. [CrossRef]
Louche, A.; Notton, G.; Poggi, P.; Simonnot, G. Correlations for direct normal and global horizontal irradiation on a French Mediterranean site. Solar Energy (Journal of Solar Energy Science and Engineering); (USA) 1991, 46. [CrossRef]
Maxwell, E. A quasi-physical model for converting hourly global horizontal to direct normal insolation. Technical report, Solar Energy Research Institute, 1987.
Perez, R.; Ineichen, P.; Maxwell, E.; Seals, R.; Zelenka, A. Dynamic global-to-direct irradiance conversion models. ASHRAE Transactions 1992, 98, 354–369.
Lave, M.; Hayes, W.; Pohl, A.; Hansen, C.W. Evaluation of Global Horizontal Irradiance to Plane-of-Array Irradiance Models at Locations Across the United States. IEEE Journal of Photovoltaics 2015, 5, 597–606. [CrossRef]
Lee, K.; Yoo, H.; Levermore, G.J. Quality control and estimation hourly solar irradiation on inclined surfaces in South Korea. Renewable Energy 2013, 57, 190 – 199. [CrossRef]
Skartveit, A.; Olseth, J.A. A model for the diffuse fraction of hourly global radiation. Solar Energy 1987, 38, 271 – 274. [CrossRef]
Lam, J.C.; Li, D.H. Correlation between global solar radiation and its direct and diffuse components. Building and Environment 1996, 31, 527 – 535. [CrossRef]
Reindl, D.; Beckman, W.; Duffie, J. Diffuse Fraction Correlations. Solar Energy 1990, 45, 1–7. [CrossRef]
Yao, W.; Li, Z.; Xiu, T.; Lu, Y.; Li, X. New decomposition models to estimate hourly global solar radiation from the daily value. Solar Energy 2015, 120, 87–99. [CrossRef]
Khalil, S.A.; Shaffie, A. A comparative study of total, direct and diffuse solar irradiance by using different models on horizontal and inclined surfaces for Cairo, Egypt. Renewable and Sustainable Energy Reviews 2013, 27, 853–863. [CrossRef]
Gueymard, C.A.; Ruiz-Arias, J.A. Extensive worldwide validation and climate sensitivity analysis of direct irradiance predictions from 1-min global irradiance. Solar Energy 2016, 128, 1–30. Special issue: Progress in Solar Energy, . [CrossRef]
Laiti, L.; Giovannini, L.; Zardi, D.; Belluardo, G.; Moser, D. Estimating Hourly Beam and Diffuse Solar Radiation in an Alpine Valley: A Critical Assessment of Decomposition Models. Atmosphere 2018, 9. [CrossRef]
Oliveira, A.P.; Escobedo, J.; Machado, A.; Soares, J. Correlation models of diffuse solar-radiation applied to the city of São Paulo, Brazil. Applied Energy 2002, 71, 59–73. [CrossRef]
Salazar, G.; Pedrosa Filho, M. Analysis of the Diffuse Fraction from Solar Radiation Values Measured in Argentina and Brazil Sites. 2020. [CrossRef]
Chendo, M.; Maduekwe, A. Hourly global and diffuse radiation of Lagos, Nigeria—Correlation with some atmospheric parameters. Solar Energy 1994, 52, 247–251. [CrossRef]
Chikh, M.; Mahrane, A.; Haddadi, M. Modeling the Diffuse Part of the Global Solar Radiation in Algeria. Energy Procedia 2012, 18, 1068–1075. Terragreen 2012: Clean Energy Solutions for Sustainable Environment (CESSE), . [CrossRef]
Benchrifa, M.; Tadili, R.; Idrissi, A.; Essalhi, H.; Mechaqrane, A. Development of New Models for the Estimation of Hourly Components of Solar Radiation: Tests, Comparisons, and Application for the Generation of a Solar Database in Morocco. International Journal of Photoenergy 2021, 2021, 1–16. [CrossRef]
Engerer, N. Minute resolution estimates of the diffuse fraction of global irradiance for southeastern Australia. Solar Energy 2015, 116, 215–237. [CrossRef]
Tsubo, M.; Walker, S. Relationships between photosynthetically active radiation and clearness index at Bloemfontein, South Africa. Theoretical and Applied Climatology 2005, 80, 17–25. [CrossRef]
Nijegorodov, N. Improved ashrae model to predict hourly and daily solar radiation components in Botswana, Namibia, and Zimbabwe. Renewable Energy 1996, 9, 1270–1273. World Renewable Energy Congress Renewable Energy, Energy Efficiency and the Environment, . [CrossRef]
Mabasa, B.; Lysko, M.D.; Tazvinga, H.; Zwane, N.; Moloi, S.J. The Performance Assessment of Six Global Horizontal Irradiance Clear Sky Models in Six Climatological Regions in South Africa. Energies 2021, 14. [CrossRef]
Mahachi, T. Energy yield analysis and evaluation of solar irradiance models for a utility scale solar PV plant in South Africa. Master’s thesis, University of Stellenbosch, South Africa, 2016. [CrossRef]
Daniel, F.M.; Rix, A.J. The Evaluation of Decomposition Models Under Varying Weather Conditions for South Africa. Southern African Sustainable Energy Conference, 2021, pp. 207–213.
SAURAN. https://sauran.ac.za/, 2022.
Daniel-Durandt, F.M.; Rix, A.J. Automating Quality Control of Irradiance Data with a Comprehensive Analysis for Southern Africa. Solar 2023, 3, 596–617. [CrossRef]
Walpole, R.; Myers, R.; Myers, S.; Ye, K. Probability & Statistics for Engineers & Scientists; Pearson Education Inc.: Boston, MA, 2012.
Anaconda Software Distribution, 2020.
Holmgren, W.; Hansen, C.; Mikofski, M. pvlib python: a python package for modeling solar energy systems. Journal of Open Source Software 2018, 3, 884. [CrossRef]
McKinney, W.; others. Data structures for statistical computing in python. Proceedings of the 9th Python in Science Conference. Austin, TX, 2010, Vol. 445, pp. 51–56. [CrossRef]
Hunter, J.D. Matplotlib: A 2D graphics environment. Computing in Science & Engineering 2007, 9, 90–95. [CrossRef]
Mahachi, T.; Rix, A. Evaluation of irradiance decomposition and transposition models for a region in South Africa Investigating the sensitivity of various diffuse radiation models. IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society, 2016, pp. 3064–3069. [CrossRef]
Kasten, F.; Young, A.T. Revised optical air mass tables and approximation formula. Appl. Opt. 1989, 28, 4735–4738. [CrossRef]
Farmer, W.; Rix, A.J. Mapping the spatial perturbations seen by the power system network due to intermittent renewable energy sources. Southern African Sustainable Energy Conference, 2021, pp. 222–228.
Solargis. https://solargis.com/maps-and-gis-data/download/south-africa, 2021.

Figure 1. The irradiance relationships between GHI, DNI, DHI and

θ_{Z}

.

Figure 1. The irradiance relationships between GHI, DNI, DHI and

θ_{Z}

.

Figure 2. Validation sites of discussed decomposition models.

Figure 4. Distribution of data within clusters.

Table 1. SAURAN station summary [39,40].

	Name (Location)	Coordinates	Elevation
		(Lat (^∘S), Long (^∘E))	(m)
CSIR	CSIR Energy Centre (Pretoria, South Africa)	25.747, 28.279	1400
CUT	Central University of Technology (Bloemfontein, South Africa)	29.121, 26.216	1397
FRH	University of Fort Hare (Alice, South Africa)	32.785, 26.845	540
GRT	Graaff-Reinet (Graaff-Reinet, South Africa)	32.485, 24.586	660
HLO	Mariendal (Mariendal, South Africa)	33.854, 18.824	178
ILA	Ilanga CSP Plant (Upington, South Africa)	28.490, 21.520	884
KZH	University of KwaZulu-Natal Howard College (Durban, South Africa)	29.871, 30.977	150
KZW	University of KwaZulu-Natal Westville (Durban, South Africa)	29.817, 30.945	200
MIN	CRSES Mintek (Johannesburg, South Africa)	26.089, 27.978	1521
NMU	Nelson Mandela University (Gqeberha, South Africa)	34.009, 25.665	35
NUST	Namibian University of Science and Technology (Windhoek, Namibia)	22.565, 17.075	1683
RVD	Richtersveld (Alexander Bay, South Africa)	28.561, 16.761	141
SUN	Stellenbosch University (Stellenbosch, South Africa)	33.935, 18.867	119
UBG	Gaborone (Gaborone, Botswana)	24.661, 25.934	1014
UFS	University of Free State (Bloemfontein, South Africa)	29.111, 26.185	1491
UNV	Venda (Vuwani, South Africa)	23.131, 30.424	628
UNZ	University of Zululand (KwaDlangezwa, South Africa)	28.853, 31.852	90
UPR	University of Pretoria (Pretoria, South Africa)	25.753, 28.229	1410
VAN	Vanrhynsdorp (Vanrhynsdorp, South Africa)	31.617, 18.738	130

Table 4. Clusters mean irradiances.

	Mean ⁴
	GHI	DNI	DHI
	[ $W / m^{2}$ ]	[ $W / m^{2}$ ]	[ $W / m^{2}$ ]
Cluster 1	592	669	135
Cluster 2	583	604	165
Cluster 3	534	523	178
Cluster 4	557	579	158

⁴ Mean values of training set

Table 5. Hourly validation results of decomposition model development for CSIR.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 6. Hourly validation results of decomposition model development for CUT.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 7. Hourly validation results of decomposition model development for FRH.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 8. Hourly validation results of decomposition model development for GRT.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 9. Hourly validation results of decomposition model development for HLO.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 10. Hourly validation results of decomposition model development for KZH.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 11. Hourly validation results of decomposition model development for KZW.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 12. Hourly validation results of decomposition model development for NMU.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 13. Hourly validation results of decomposition model development for NUST.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 14. Hourly validation results of decomposition model development for RVD.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 15. Hourly validation results of decomposition model development for SUN.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 16. Hourly validation results of decomposition model development for UBG.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 17. Hourly validation results of decomposition model development for UFS.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 18. Hourly validation results of decomposition model development for UNV.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 19. Hourly validation results of decomposition model development for UNZ.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 20. Hourly validation results of decomposition model development for UPR.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 21. Hourly validation results of decomposition model development for VAN.

Model	Entire Dataset				$K_{t} < 0.60$	$K_{t} \geq 0.60$
Model	$R^{2}$	MBE [%]	RMSE [%]	MAE [%]	MAE [%]	MAE [%]

Table 22. Summary of test and validation sets of stations outperforming baseline models.

Dataset	Localised model		Cluster model		Regional model
	outperforms		outperforms		outperforms
	baseline models		baseline models		baseline models
	Test	Validation	Test	Validation	Test	Validation
CSIR	✓	✓	✓	✓	✓	✓
CUT	✓	✓	✓	✓	✓	✓
FRH	✓	✓	✓	✓	✓	✓
GRT	✓	✓	✓	✓	✓	✓
HLO	✓	✓	✓	✓	✓	✓
ILA	-	✓	-	✓	-	✓
KZH	✓	✓	✓	✓	✓	✓
KZW	✓	✓	✓	✓	✓	✓
MIN	-	✓	-	✓	-	✓
NMU	✓	✓	✓	✓	✓	✓
NUST	✓	✓	✓	✓	✓	✓
RVD	✓	✓	✓	✓	✓	✓
SUN	✓	✓	✓	✓	✓	✓
UBG	✓	✓	✓	✓	✓	✓
UFS	✓	✓	✓	✓	✓	✓
UNV	✓	✓	✓	✓	✓	✓
UNZ	✓	✓	✓	✓	✓	✓
UPR	✓	✓	✓	✓	✓	✓
VAN	✓	✓	✓	✓	✓	✓

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

New Decomposition Models for Hourly Direct Normal Irradiance Estimations for Southern Africa

Abstract

Keywords:

Subject:

1. Introduction

2. Model Development

2.1. SAURAN Database

2.2. Comparison Metrics

2.3. Regression and Fitting

2.4. Software Development Tools

2.5. Baseline Models

2.6. Decomposition Model Development Methodology

3. Development of New Decomposition Models

3.1. Localised Decomposition Models

3.2. Cluster Decomposition Models

3.2.1. Cluster 1

3.2.2. Cluster 2

3.2.3. Cluster 3

3.2.4. Cluster 4

3.3. Regional Decomposition Model

4. Results

4.1. Testing and Validation Results

4.1.1. CSIR

4.1.2. CUT

4.1.3. FRH

4.1.4. GRT

4.1.5. HLO

4.1.6. ILA

4.1.7. KZH

4.1.8. KZW

MIN

4.1.9. NUST

4.1.10. RVD

4.1.11. SUN

4.1.12. UBG

4.1.13. UFS

4.1.14. UNV

4.1.15. UNZ

4.1.16. UPR

4.1.17. VAN

4.2. Discussion

5. Conclusion

Author Contributions

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Localised Decomposition Models

Appendix A.1. CSIR

Appendix A.2. CUT

Appendix A.3. FRH

Appendix A.4. GRT

Appendix A.5. HLO

Appendix A.6. ILA

Appendix A.7. KZH

Appendix A.8. KZW

Appendix A.9. MIN

Appendix A.10. NMU

Appendix A.11. NUST

Appendix A.12. RVD

Appendix A.13. SUN

Appendix A.14. UBG

Appendix A.15. UFS

Appendix A.16. UNV

Appendix A.17. UNZ

Appendix A.18. UPR

Appendix A.19. VAN

References

MDPI Initiatives

Important Links

Subscribe