Some Real Data Analysis:
The data sets are obtained from the site of OECD which stands for Organization for Economic Co-operation and Development, it provides data about the economy, social events, education, health, labor, and environment in the countries involved in the organization. The data is available at
https://stats.oecd.org/index.aspx?DataSetCode=BLI
First data: (Dwelling Without Basic Facilities)
These observations measure the percentage of homes in the involved countries that lack essential utilities like indoor plumbing, central heating, clean drinking water supplies.
| 0.008 |
0.007 |
0.002 |
0.094 |
0.123 |
0.023 |
0.005 |
0.005 |
0.057 |
0.004 |
| 0.005 |
0.001 |
0.004 |
0.035 |
0.002 |
0.006 |
0.064 |
0.025 |
0.112 |
0.118 |
| 0.001 |
0.259 |
0.001 |
0.023 |
0.009 |
0.015 |
0.002 |
0.003 |
0.049 |
0.005 |
| 0.001 |
|
|
|
|
|
|
|
|
|
Second data : (Quality of Support Network)
This data set explores how much the person can rely on sources of support like family, friends, or community members in time of need and disparate. It is represented as percentage of persons who had found social support in times of crises.
| 0.98 |
0.96 |
0.95 |
0.94 |
0.93 |
0.8 |
0.82 |
0.85 |
0.88 |
0.89 |
| 0.78 |
0.92 |
0.92 |
0.9 |
0.96 |
0.96 |
0.94 |
0.77 |
0.95 |
0.91 |
Third data : ( Educational Attainment )
The oservations measure the percentage of population in the OECD database that completed their high level of education like high school or equivalent.
| 0.84 |
0.86 |
0.8 |
0.92 |
0.67 |
0.59 |
0.43 |
0.94 |
0.82 |
0.91 |
| 0.91 |
0.81 |
0.86 |
0.76 |
0.86 |
0.76 |
0.85 |
0.88 |
0.63 |
0.89 |
| 0.89 |
0.94 |
0.74 |
0.42 |
0.81 |
0.81 |
0.93 |
0.55 |
0.92 |
0.9 |
| 0.63 |
0.84 |
0.89 |
0.42 |
0.82 |
0.92 |
|
|
|
|
Fourth data : ( Flood Data)
These are 20 observations for the maximum flood level in Susquehanna River at Harrisburg, Penssylvania (Dumonceaux & Antle, 1973) .
| 0.26 |
0.27 |
0.3 |
0.32 |
0.32 |
0.34 |
0.38 |
0.38 |
0.39 |
0.4 |
| 0.41 |
0.42 |
0.42 |
0.42 |
045 |
0.48 |
0.49 |
0.61 |
0.65 |
0.74 |
Fifth data : ( Time between Failures of Secondary Reactor Pumps)(Maya et al., 2024, 1999)(Suprawhardana and Prayoto)
| 0.216 |
0.015 |
0.4082 |
0.0746 |
0.0358 |
0.0199 |
0.0402 |
0.0101 |
0.0605 |
| 0.0954 |
0.1359 |
0.0273 |
0.0491 |
0.3465 |
0.007 |
0.656 |
0.106 |
0.0062 |
| 0.4992 |
0.0614 |
0.532 |
0.0347 |
0.1921 |
|
|
|
|
Analysis of above data sets and how do these sets fit the following distributions (unit distributions): Beta, Topp Leone, Unit Lindely, Kumaraswamy distributions is conducted and compared with the new distribution (MBUR Distribution). The tools for comparison are: -2LL, AIC, AIC corrected, BIC, Hannan Quinn Information Criteria (HQIC). Also K-S test is conducted with its value reported and the result of the H0 null hypothesis that assumes the data set follows the tested distribution otherwise reject the null. The P value for the test is also recorded. Figures of the empirical CDF (ecdf) and the theoretical CDF of the 5 distributions are illustrated. The values of the estimated parameters, their estimated variance and standard errors are reported.
First data set (Dwellings without basic facilities):
| |
Beta |
Kumaraswamy |
MBUR |
Topp-Leone |
Unit-Lindley |
| theta |
|
|
2.3519 |
0.2571 |
26.1445 |
|
|
| Var |
.0323 |
.6661 |
.0086 |
.2424 |
0.023 |
0.0021 |
20.5623 |
| .6661 |
22.1589 |
.2424 |
9.228 |
| SE |
0.03227 |
0.01666 |
0.0272 |
0.0082 |
0.8144 |
| 0.8455 |
0.5456 |
| AIC |
161.5535 |
163.8979 |
147.3057 |
137.593 |
144.5918 |
| AIC¶correc |
161.982 |
164.3265 |
147.4436 |
137.7309 |
144.7297 |
| BIC |
164.4214 |
166.7659 |
148.7397 |
139.027 |
146.0258 |
| HQIC |
4.2816 |
4.2851 |
4.2611 |
4.2448 |
4.2567 |
| NLL |
-78.7767 |
-79.9489 |
-72.6528 |
-67.7965 |
-71.2959 |
| K-S ¶Value |
.2052 |
.1742 |
.2034 |
.2818 |
.3762 |
| H0
|
Fail to reject |
Fail to reject |
Fail to reject |
reject |
reject |
| P-value |
0.1271 |
0.271 |
0.1336 |
0.0114 |
0.000189 |
Figure 11.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 1st data set (Dwellings without basic facilities).
Figure 11.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 1st data set (Dwellings without basic facilities).
As shown from the analysis, 3 distributions better fit the data than the others; these are Beta, Kumaraswamy, and MBUR distributions. This is because the K-S test failed to reject the null hypothesis, H0, which hypothesized that the data being from the test distribution. MBUR had the lowest values of absolute values of NLL, AIC, corrected AIC, BIC, and HQIC values in comparison with the values obtained from the Beta and the Kumaraswamy distributions.
Second data set (Quality of support network):
| |
Beta |
Kumaraswamy |
MBUR |
Topp-Leone |
Unit-Lindley |
| theta |
|
|
0.3591 |
71.2975 |
0.1334 |
|
|
| Var |
86.461 |
9.0379 |
15.7459 |
3.2005 |
0.0008 |
254.1667 |
0.00045 |
| 9.0379 |
1.0646 |
3.2005 |
1.0347 |
| SE |
2.079 |
0.8873 |
0.0063 |
3.565 |
0.0047 |
| 0.231 |
0.2275 |
| AIC |
64.5056 |
64.7274 |
62.0790 |
60.6796 |
61.3746 |
| AIC¶correc |
65.2115 |
65.4333 |
62.3012 |
60.9018 |
61.5968 |
| BIC |
66.497 |
66.7188 |
63.0747 |
61.6753 |
62.3703 |
| HQIC |
3.9289 |
3.9299 |
3.9224 |
3.9159 |
3.9191 |
| NLL |
-30.2528 |
-30.3637 |
-30.0395 |
-29.3398 |
-29.6873 |
| K-S ¶Value |
.0974 |
.0995 |
.1309 |
.1327 |
.1057 |
| H0
|
Fail to reject |
Fail to reject |
Fail to reject |
Fail to reject |
Reject to reject |
| P-value |
0.9416 |
0.9513 |
0.8399 |
0.4627 |
0.954 |
Figure 12.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 2nd data set (Quality of support network). .
Figure 12.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 2nd data set (Quality of support network). .
As shown from the analysis, 5 distributions fit the data well. This is because the K-S test failed to reject the null hypothesis, H0 ,which hypothesized that the data being from the test distribution. Topp-Leone had the lowest values of absolute values of NLL, AIC, corrected AIC, BIC, and HQIC values in comparison with the values obtained from the Unit-Lindley (which is the second next in ascending sort of the values) and the MBUR distributions (which is the third next in this ascending sort), followed by Beta then Kumaraswamy distributions.( in other words; ascending sort of the 5 distributions)
What is obvious in this analysis is that these metric values of MBUR distribution are comparable to those of Topp-Leone and Unit-Lindley, which denotes that the new distribution (MBUR) had accomplished a good job in fitting the data.
Third data set (Educational Attainment):
| |
Beta |
Kumaraswamy |
MBUR |
Topp-Leone |
Unit-Lindley |
| theta |
|
|
0.5556 |
13.4254 |
0.2905 |
|
|
| Var |
3.4283 |
1.0938 |
1.3854 |
0.5232 |
0.0011 |
5.0067 |
0.0012 |
| 1.0938 |
0.416 |
0.5232 |
0.3234 |
| SE |
0.3086 |
0.1962 |
0.0055 |
0.373 |
0.0058 |
| 0.1075 |
0.0948 |
| AIC |
54.6152 |
55.5937 |
52.8713 |
44.5725 |
60.9322 |
| AIC¶correc |
54.9789 |
55.9573 |
52.9890 |
44.6901 |
61.0498 |
| BIC |
57.7823 |
58.7607 |
54.4549 |
46.1560 |
62.5157 |
| HQIC |
4.0422 |
4.0471 |
4.0384 |
3.9916 |
4.0764 |
| NLL |
-25.3076 |
-25.7968 |
-25.4357 |
-21.2862 |
-29.4661 |
| K-S ¶Value |
.1453 |
.1390 |
.1468 |
.2493 |
.0722 |
| H0
|
Fail to reject |
Fail to reject |
Fail to reject |
Reject |
Fail to reject |
| P-value |
0.2055 |
0.2411 |
0.1979 |
0.0062 |
0.8300 |
Figure 13.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 3rd data set (Educational Attainment).
Figure 13.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 3rd data set (Educational Attainment).
As shown from the analysis, 4 distributions fit the data well; all but not Topp-Leone which failed to fit the data well. This is because the K-S test failed to reject the null hypothesis, H0 , which hypothesized that the data being from the test distribution. MBUR distribution had the lowest values of absolute values of NLL, AIC, corrected AIC, BIC, and HQIC values in comparison with the values obtained from the Beta, Kumaraswamy and Unit-Lindley.
Fourth data set (Flood Data):
| |
Beta |
Kumaraswamy |
MBUR |
Topp-Leone |
Unit-Lindley |
| theta |
|
|
1.0443 |
2.2413 |
1.6268 |
|
|
| Var |
7.22 |
7.2316 |
0.3651 |
2.8825 |
0.007 |
0.2512 |
0.0819 |
| 7.2316 |
8.0159 |
2.8825 |
29.963 |
| SE |
0.6008 |
0.1351 |
0.0187 |
0.1121 |
0.0639 |
| 0.6331 |
1.2239 |
| AIC |
32.3671 |
55.5937 |
52.8713 |
44.5725 |
60.9322 |
| AIC¶correc |
33.073 |
30.6524 |
15.1455 |
16.985 |
16.5676 |
| BIC |
34.3586 |
31.938 |
15.9191 |
17.7585 |
17.3411 |
| HQIC |
3.7154 |
3.6893 |
3.456 |
3.4996 |
3.4902 |
| NLL |
-14.1836 |
-12.9733 |
-6.4617 |
-7.3814 |
-7.1727 |
| K-S ¶Value |
.2063 |
.2175 |
.3202 |
.3409 |
.2625 |
| H0
|
Fail to reject |
Fail to reject |
Fail to reject |
Reject |
Fail to reject |
| P-value |
0.3174 |
0.2602 |
0.0253 |
0.0141 |
0.0311 |
Figure 14.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 4th data set (Flood Data).
Figure 14.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 4th data set (Flood Data).
As shown from the analysis, 4 distributions fit the data well; all but not Topp-Leone which failed to fit the data well. This is because the K-S test failed to reject the null hypothesis, H0 , which hypothesized that the data being from the test distribution. MBUR distribution had the lowest values of absolute values of NLL, AIC, corrected AIC, BIC, and HQIC values in comparison with the values obtained from the Beta, Kumaraswamy and Unit-Lindley
What is obvious in this analysis is that these metric values of MBUR distribution are comparable to those of Topp-Leone, which denotes that the new distribution (MBUR) had accomplished a good job in fitting the data.
Fifth data set (Time Between Failures of Secondary Reactor Pumps):
| |
Beta |
Kumaraswamy |
MBUR |
Topp-Leone |
Unit-Lindley |
| theta |
|
|
1.7886 |
0.4891 |
4.1495 |
|
|
| Var |
0.071 |
0.2801 |
0.0198 |
0.1033 |
0.018 |
0.0104 |
0.5543 |
| 0.2801 |
1.647 |
0.1033 |
0.9135 |
| SE |
0.0555 |
0.0293 |
0.0279 |
0.0213 |
0.1552 |
| 0.2676 |
0.1993 |
| AIC |
44.0571 |
44.6592 |
41.862 |
39.5653 |
31.007 |
| AIC¶correc |
44.6571 |
45.2592 |
42.0525 |
39.7558 |
31.1975 |
| BIC |
46.3281 |
46.9302 |
42.9975 |
40.7008 |
32.1425 |
| HQIC |
3.8556 |
3.8598 |
3.8472 |
3.8303 |
3.7549 |
| NLL |
-20.0285 |
-20.3296 |
-19.9310 |
-18.7827 |
-14.5035 |
| K-S ¶Value |
0.1541 |
0.1393 |
0.1584 |
0.1962 |
0.3274 |
| H0
|
Fail to reject |
Fail to reject |
Fail to reject |
Fail to Reject |
Reject |
| P-value |
0.5918 |
0.7123 |
0.5575 |
0.2982 |
0.0107 |
Figure 15.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 5th data set (Time Between Failures of Secondary Reactor Pumps).
Figure 15.
shows the eCDF vs. theoretical CDF of the 5 distributions for the 5th data set (Time Between Failures of Secondary Reactor Pumps).
As shown from the analysis, 4 distributions fit the data well; all but not Unit-Lindley which failed to fit the data well. This is because the K-S test failed to reject the null hypothesis, H0 , which hypothesized that the data being from the test distribution. Topp-Leone distribution had the lowest values of absolute values of NLL, AIC, corrected AIC, BIC, and HQIC values in comparison with the values obtained from the MBUR followed by Beta then the Kumaraswamy. (in other words in ascending sort)