Preprint
Article

Potential of Machine Learning for Predicting Sleep Disorders: A Comprehensive Analysis of Regression and Classification Models

Altmetrics

Downloads

119

Views

69

Comments

1

A peer-reviewed article of this preprint also exists.

Submitted:

13 December 2023

Posted:

14 December 2023

You are already at the latest version

Alerts
Abstract
Sleep disorder is a disease that can be categorized as both an emotional and physical problem. It imposes several difficulties and problems, such as distress during the day, sleep-wake disorders, anxiety, and several other problems. Hence, the main objective of this research is to utilize the strong capabilities of machine learning in the prediction of sleep disorders. In specific, this research aims to meet three main objectives. These objectives are to identify the best regression model, the best classification model, and the best learning strategy that highly suits sleep disorder datasets. Considering two related datasets and several evaluation metrics that are related to the tasks of regression and classification, the results revealed the superiority of the MultilayerPerceptron, SMOreg, and KStar regression models compared with the other twenty-three regression models. Also, IBK, RandomForest, and RandomizableFilteredClassifier showed superior performance compared with other classification models that belong to several learning strategies. Finally, the Function learning strategy showed the best predictive performance among the six considered strategies in both datasets and with respect to the most evaluation metrics.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning

1. Introduction

Sleep is an important natural activity for humans and plays a very important role in everybody’s health [1]. Our body supports healthy brain functionality and maintains the necessary physical health while sleeping [2]. Moreover, sleeping is very important for body development and growth, especially for children and teenagers. Sleeping really impacts the way of thinking, working, learning, reacting, and many other aspects of daily life. It also affects the circulation, immunity, and respiratory systems of our bodies [3].
On the other hand, lack of sleep (sleep disorder) causes several problems and difficulties in daily life [4]. To name a few, sleep disorders increase the levels of hormones that control hunger, increase consumption of sweet, salty, and fatty foods, decrease the levels of physical activity, and increase the risk of obesity, stroke, and heart disease [5]. It may also cause stress, fatigue, and functional weaknesses [6,7]. Moreover, sleep disorder is one of the main reasons for sleep apnea. According to recent statistics from U.S. census data, more than 140 million (70 million men, 50 million women, and 20 million children) snore mostly because of sleep apnea. Globally, around 936 million adults suffer from mild to severe sleep apnea. Moreover, according to several global research works, around 10%, even up to 30% of the world’s population suffer from sleep disorder, and in some countries the percentage may reach 60%. Furthermore, sleep disorder is nearly 7% higher among women than among men. Finally, sleep disorder represents a global epidemic that threatens the quality of life and health for around 45% of the world’s population.
Based on the recent literature of sleep disorder, it can be noted that the following research dominates this field. Firstly, the relationship between Covid-19 and sleep disorder. Secondly, searching for new tests other than obstructive sleep apnea (OSA) that is less costly and more comfortable to possible patients is an urgent need. Finally, the utilization of machine learning and wearable devices with fewer sensors for sleep disorder diagnosis at home without the need to sleep in specific sleep centers.
Consequently, this research aimed to provide additional knowledge and contribute to the solution of the sleep disorder problem through utilizing machine learning capabilities in the prediction task of sleep disorders [8]. In specific, this research was interested in three main objectives:
  • To identify the best regression model that highly suits disorder datasets among twenty three different regression models
  • To identify the best classification model that highly suits disorder datasets among twenty nine different classification models
  • To identify the best learning strategy that highly suits disorder datasets among six different well-known strategies
Therefore, this research considered two main machine learning tasks: regression and classification. Both tasks were used to predict unknown values [9]. The difference was that regression was used to predict numeric values, while classification was used to predict non-numeric values [10,33].
Regarding the classification task, it was defined as the ability to predict the class label for unseen cases or examples accurately [11,12]. Classification was of two types: single label classification (SLC) and multi label classification (MLC). The former type associates every instance or case with only one class label, while the latter may associate an instance or example with more than one class label [13–15].
SLC was also divided into two subtypes: binary classification and multiclass classification [16]. For binary classification, the total number of class labels in the dataset was only two [17,18]. For multiclass classification, the number of class labels in the dataset was more than two. The dataset in this research belonged to the multiclass classification [19].
Regarding the regression task, it was defined as the task of understanding the relationship between the objective variable (the dependent variable) and the considered variables and features in the dataset (independent variables) [20]. The objective variable in regression must be continuous; it was a main supervised task in machine learning that aimed to predict the value of a continuous variable based on a set of known variables [21]. Regression has many real life applications, such as forecasting house prices [22], predicting users’ trends [23], and predicting interest rates [24,25], among several other others.
To achieve the first objective, this research considered twenty three regression models that belonged to four learning strategies. These regression models were evaluated and compared using two datasets with respect to five well-known evaluation metrics. To achieve the second objective, twenty nine classification models were evaluated and compared with respect to five popular metrics in the domain of classification.
The rest of the paper is organized as follows: Section 2 reviews the most recent work related to sleep disorder detection using machine learning techniques. Section 3 describes the research methodology and the considered datasets, and it provides the empirical results, followed by the main findings. Section 4 concludes and suggests a future direction.

2. Related Work

Everyone requires sleep. It is a crucial component of how our bodies work. You may require more or less sleep than others, but doctors advise people to get seven to nine hours per night. Most people face a problem with sleeping called a sleep disorder. Sleep disorders are situations in which the usual sleep pattern or sleep behaviors are disrupted, and the main sleep disorders include insomnia, hypersomnia, obstructive sleep apnea, and parasomnias.
In addition to contributing to other medical concerns, several of these disorders may also be signs of underlying mental health problems, which led researchers to do a lot of research. In [26], the authors presented a thorough study of the relationship between vitamin D and sleep problems in children and adolescents who suffer from sleep disorders such as insomnia, obstructive sleep apnea (OSA), restless leg syndrome (RLS), and other sleep disorders. The research synthesized information regarding the role and mechanism of the action of vitamin D. A review of the use of melatonin and potential processes in the sleep disturbances of Parkinson’s disease patients can be found in [27].
In [28], researchers conducted a systematic study and meta-analysis to identify the key elements contributing to sleep and anxiety problems during the COVID-19 pandemic lockdown. Additionally, the study aimed to forecast potential correlations and determinants in conjunction with results connected to COVID-19 pandemic-induced stress and difficulties and analyzed the various symptoms and complaints that people experienced with regard to their sleep patterns. The Pittsburgh Sleep Quality Index (PSQI), machine learning algorithms, and the general assessment of anxiety disorders were used to analyze the outcomes. The study looked at a significant correlation between symptoms such as poor sleep, anxiety, depressive symptoms, and insomnia, as well as the COVID-19 pandemic lockdown.
In [29], a cross-validated model was proposed for classifying sleep quality based on the goal of the act graph data. The final classification model demonstrated acceptable performance metrics and accuracy when it was assessed using two machine learning techniques: support vector machines (SVM) and K-nearest neighbors (KNN). The findings of this research can be utilized to cure sleep disorders, create and construct new methods to gauge and monitor the quality of one’s sleep, and enhance current technological devices and sensors.
In [30], they proposed a general-purpose sleep monitoring system that may be used to monitor bed exits, assess the danger of developing pressure ulcers, and monitor the impact of medicines on sleep disorders. Additionally, they contrasted a number of supervised learning algorithms to find which was most appropriate in this situation. The experimental findings from comparing the chosen supervised algorithms demonstrated that they can properly infer sleep duration, sleep postures, and routines with a fully unobtrusive method.
In [31], they proposed a reliable approach for classifying different stages of sleep using a sleep standard called AASM based on a single channel of electroencephalogram (EEG) data. The use of statistical features to analyze the sleep characteristics and the three distinct feature combinations utilized to categorize the two-state sleep phases were the main contributions of this work. Both patients with sleep disorders and healthy control subjects participated in three separate trials with three distinct sets of characteristics. As a result, many machine learning classifiers were developed to categorize the various stages of sleep.

3. Materials and Methods

This section represents the core of this research. Firstly, the datasets are described along with the required preprocessing steps. Then, the evaluation results for the twenty three considered regression models are provided and discussed. After that, a comparative analysis among twenty nine classification models (classifiers) was conducted and analyzed. Finally, a discussion regarding the most interesting findings is carried out.
Regarding the experimental design, all classification and regression models were used with their default settings and parameters except for the IBK algorithm, where the KNN parameter was changed from 1 to 3. Moreover, the considered models were implemented using the Python programming language. Experiments have also been conducted on the Intel i3 core. Finally, to handle the problem of missing values, all missing values were estimated to be the average of the values within the same class. The main phases of research methodology are shown in Figure 1.

Datasets and Preprocessing Step

Two datasets were considered in this research. The first one (Dataset 1) consists of 62 cases and 11 features. This dataset was an extended version of the second dataset (Dataset 2), where three features were added and considered. Both datasets suffer from missing values. The main goal of collecting the datasets was to study sleeping patterns in mammals. Another main goal behind collecting this data was to identify the main factors affecting the quality of sleep and to diagnose the main risks regarding sleep disorders. The main features (attributes) in both datasets were: body weight, brain weight, predation index, sleep exposure index, gestation time, and danger index. All of these features were numerical and both datasets consisted of five class labels. Both datasets are graciously shared on Kaggle and freely available at the following URL: (https://www.kaggle.com/datasets/volkandl/sleep-in-mammals, accessed on 12 December 2023). Table 1 summarizes the main characteristics of the considered datasets, while Table 2 provides more information regarding the features in both datasets.
Originally, both datasets were of type regression. Nevertheless, a mapping was carried out to convert the objective feature from being a number to a class variable (string). For example, instead of having ‘1’ as a value for the ‘overall danger index’ feature, it was converted to ‘A’, and instead of having ‘5’ as a value for the ‘overall danger inde’x feature, it was converted to ‘E’.
Figure 2 and Figure 3 depict the correlation matrices for Dataset 1 that consisted of 10 features (excluding the class feature), and Dataset 2 that consisted of 7 features (excluding the class feature) respectively.

4. Results

4.1. Identifying the Best Regression Model

Identifying the best regression model was the main objective of this research. To meet this objective, twenty three regression models were considered and evaluated. These models belonged to five well-known strategies.
The Function learning strategies were represented through four models: Gaussian processes, linear regression, multilayer perception, and SMOreg. Three models were used to represent the Lazy learning strategy: IBK, KStar, and LWL. For the meta-learning strategy, the following eight regression models were considered: AdditiveRegression, Bagging, RandomCommittee, RandomizableFilteredClassifier, RandomSubSpace, RegressionByDiscretization, Stacking, and Vote. The Rules learning strategy was represented using the following models: DecisionTable, M5Rules, and ZeroR. Finally, five models were used to represent Tree learning strategies (DecisionStump, M5P, RandomForest, RandomTree, and REPTree).
It is worth mentioning that all these models were used with their default settings and parameters, except for the IBK algorithm, where the KNN parameter was changed from 1 to 3.
The evaluation phase of the considered regression models was carried out on both datasets (Dataset 1 and Dataset 2) with respect to five different and well-known evaluation metrics such as correlation coefficient (CC), mean absolute error (MAR), root mean squared error (RMSE), relative absolute error (RAE), and root relative squared error (RRSE). These metrics were computed using the following equations:
C C = ( x i x ¯ ) ( y i y ¯ ) ( x i x ¯ ) 2 ( y i y ¯ ) 2
M A E = i = 1 n y i   x i n  
R M S E = i = 1 n y i y ^   ( i ) 2 N
RAE = mean of the absolute value of the actual forecast errors/mean of the absolute values of the naive model’s forecast errors
R P S E = i = 1 n ( y i   y ^ i ) 2 i = 1 n ( y i   y ¯ i ) 2  
Table 3 depicts the evaluation results for CC metrics in both datasets using twenty three regression models.
According to Table 1 and considering Dataset 1, several models achieved strong results, such as GaussianProcesses, MultilayerPerceptron, SMOreg, IBK, RegressionByDiscretization, and RandomForest. The best regression model, according to the table, was the MultilayerPerceptron regression model, which belonged to the Function learning strategy. Moreover, the second best model belonged to the Function strategy, which was SMOreg. For Dataset 2, both MultilayerPerceptron and SMOreg achieved the best results among the twenty three considered regression models.
Table 4 represents the MAE results for the twenty three regression models in both datasets. According to Table 4 and considering Dataset 1, RegressionByDiscretization which belonged to the meta-learning strategy, achieved the best (lowest) results compared with the other twenty two regression models. MultilayerPerceptron achieved the second best value. It is worth mentioning that MAE itself was not sufficient to assess the regression models. Therefore, this research considered other evaluation metrics. For dataset 2, SMOreg achieved the best results, followed by the KStar algorithm. Both models belonged to Lazy learning strategy.
Table 5 shows the results for the RMSE metric in both datasets using the same twenty three regression models. For the RMSE metric, the lower the value, the better the performance. From Table 5, and considering Dataset 1, MultilayerPerceptron and SMOreg from the Function learning strategy achieved the best two results, respectively. Moreover, RegressionByDiscretization, GaussianProcesses, and IBK achieved acceptable results compared with the other regression models considered in this research. For Dataset 2, the IBK and KStar models achieved the best two results, respectively.
Table 6 depicts the empirical results for the RAE metric, which considered twenty three regression models and two datasets. For the RAE metric, the lower the value, the better the predictive performance. According to Table 6, and considering Dataset 1, MultilayerPerceptron and SMOreg achieved the best two results, respectively. Both regression models belonged to the Function learning strategy. The third regression model was IBK, which belonged to the Lazy learning strategy. For Dataset 2, SMOreg achieved the best RAE result, followed by the KStar model.
Table 7 represents the RRSE evaluation results for the twenty three considered regression models in both datasets. For this metric, the lower the value, the better the predictive performance. Considering Dataset 1, and according to Table 7, MultilayerPerceptron and SMOreg were the best two regression models, respectively. RegressionByDiscretization regression model from the meta-learning strategy achieved the third best results on dataset 1.
Considering Dataset 2, KStar from the Lazy learning strategy achieved the best RRSE result, followed by SMOreg from the Function learning strategy.
Table 8 summarizes the previous tables in order to identify the best regression model among the twenty three considered models. For Table 8, MLP is short for MultilayerPerceptron and RBD is short for RegressionByDiscretization.
According to Table 8, the MLP model achieved the best results on Dataset 1, while SMOreg achieved the second best results on the same dataset. For dataset 2, SMOreg achieved the best results, followed by the KStar model. Hence, it can be concluded that ensemble learning was the best way to handle the prediction task for sleeping disorder datasets with respect to utilizing the following models: MLP, SMOreg, and KStar.

4.2. Identifying the Best Classification Model

This section aimed to identify the best classification algorithm to use with the problem of sleep disorders. The evaluation phase in this section considered twenty nine classification models that belonged to six learning strategies.
These classification models were: BayesNet, NaiveBayes, NaiveBayesUpdateable from Bayes learning strategy. Logistic, MultilayerPerceptron, SimpleLogistic, and SMO from Functions learning strategy. IBK, KStar, and LWL from the Lazy learning strategy. Bagging, ClassificationViaRegression, FilteredClassifier, LogitBoost, MultiClassClassifier, RandomCommittee, RandomizableFilteredClassifier, RandomSubSpace, and Vote from Meta Learning Strategy. DecisionTable, JRip, OneR, PART, and ZeroR were from the Rules learning strategy. J48, LMT, RandomTree, RandomForest, and REPTree from the Trees learning strategy
Moreover, the evaluation phase for this section considered five different and well-known metrics. These metrics were accuracy, precision, recall, F1-measure, and Matthew’s correlation coefficient (MCC). The considered evaluation metrics were computed using the following equations:
A c c u r a c y = T P + T N T P + T N + F P + F N
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
F1 = Measure = (2*Precision*Recall)/(Precision + Recall)
MCC = (TP * TN – FP * FN)/√(TP + FP) (TP + FN)(TN + FP)(TN + FN)
For all the previously mentioned metrics, the higher the value, the better the performance of the classification model.
Table 9 shows the accuracy and precision results for the twenty nine considered classification models on the two considered datasets. According to Table 9, IBK and RandomForest classifiers achieved the highest accuracy and precision results on dataset 1. For Dataset 2, IBK showed the best results among the twenty nine considered classifiers with respect to accuracy and precision metrics. Moreover, RandomizableFilteredClassifier showed the best accuracy result on Dataset 2 and the second best precision result on the same dataset.
Figure 4 depicts the constructed Tree for Dataset 1 when using RandomTree as a classification model.
Table 10 depicts the evaluation results for the twenty nine considered classifiers in both datasets, considering recall and F1-measure metrics. According to Table 10, the IBK classifier achieved the best recall results in both datasets and the best F1-measure result on Dataset 1. RandomForest classifier achieved the best F1-measure result on Dataset 2 in addition to the best recall result on Dataset 1 along with the IBK classifier.
Table 11 depicts the MCC results for the considered classifiers in both datasets. Based on Table 11, the IBK classifier that belonged to the Lazy learning strategy achieved the best MCC results on both considered datasets. Moreover, the RandomForest classifier, which belonged to the Trees learning strategy, achieved the best MCC result on Dataset 2.
Table 12 summarizes the best results obtained in Table 9 and Table 11 with respect to the five evaluation metrics considered in both datasets. For Table 12, RF stands for Random Forest classifier, and RFC stands for RandomizableFilteredClassifier.
According to Table 12, IBK and RandomForest classifiers were the best classification models to handle dataset 1, respectively, while IBK, RandomizableFilteredClassifier, and RandomForest were the best classification models to handle dataset 2.

5. Discussion

In this section, a comparative analysis regarding the best regression and classification models that could handle the task of predicting the problem of sleep disorders was introduced. The analysis considered two datasets with respect to several evaluation metrics.
Regarding the best regression model to use, it was clearly noted that no single regression model showed a general high performance considering all the metrics in both datasets. Therefore, it is highly recommended to utilize ensemble methods for this task with consideration for the best regression models, as shown in Section 3.2 (Multilayer Perceptron, SMOreg, and KStar).
For the best classification model to use, the IBK classification model showed superior performance compared with the other models. Nevertheless, other classification models showed excellent performance, such as RandomForest and RandomizableFilteredClassifier. Hence, it is highly recommended to utilize these three classification models (IBK, RandomForest, and RandomizableFilteredClassifier) in ensemble learning for handling the problem of classifying disordered sleep.
Moreover, regarding the best learning strategy to use with the problem of sleep disorder, the following strategies showed excellent performance: Lazy, Functions, Trees, and Meta. In depth, Table 13 depicts the average results for the considered models with respect to the learning strategies they belong to. The shaded rows represent Dataset 2, while the unshaded rows represent Dataset 1.
According to Table 13, the Lazy learning strategy was the best learning strategy to use with the regression task for disorder datasets, considering the five metrics. Functions was the second best learning strategy.
Considering the classification task, it is clearly seen from Table 13 that the best choice was to consider dataset 1, while the Tree strategy was the best choice when considering dataset 2, and for all five evaluated metrics. The conclusion that could be drawn is that the Function strategy was more suitable for datasets that have a large number of features, while the Trees learning strategy was more efficient for use with datasets that have a smaller number of features.
Once again, based on Table 13, the Function strategy showed superior performance considering the two considered tasks (regression and classification). Therefore, it was the most appropriate strategy to use with the prediction task of disorder datasets.
Finally, it is highly recommended to conduct more integrated research, considering experts from the machine learning domain and the sleeping disorder domain. Considering new features other than the features considered in the utilized datasets is also highly recommended.

6. Conclusions and Future Work

Sleep disorders involve problems with the amount, timing, and quality of sleep, which results in several daytime problems such as fatigue, stress, and impairment in functioning. This research aimed to add knowledge to this domain by investigating the applicability of machine learning techniques in the domain of sleep disorders. Mainly, three objectives were considered in this research. These objectives were to identify the best regression model, the best classification model, and the best learning strategy to handle the sleep disorders dataset. The results showed that MultilayerPerceptron, SMOreg, and KStar were the best regression models, and IBK, RandomForest, and RandomizableFilteredClassifier were the best classification models. Finally, the Function learning strategy showed superior performance compared with the other strategies, considering both regression and classification tasks in both datasets, with strong competition from the Lazy and Trees strategies. For future work, an ensemble learning model that consists of the best regression and classification models is highly recommended.

Author Contributions

Conceptualization, R.A., G.S. and M.A. (Mohammad Aljaidi); data curation, R.A. and M.H.Q.; formal analysis, G.S., M.A. (Mohammad Aljaidi) and A.A.; funding acquisition, M.A. (Mohammed Alshammari); investigation, R.A. and G.S.; methodology, R.A., M.A. (Mohammad Aljaidi) and A.A.; project administration, G.S. and M.A. (Mohammad Aljaidi); resources, R.A. and A.A.; software, R.A. and M.H.Q.; supervision, M.A. (Mohammad Aljaidi); validation, R.A., A.A., M.H.Q. and M.A. (Mohammed Alshammari);writing original draft, R.A. and M.A. (Mohammad Aljaidi); writing review and editing, M.H.Q., A.A. and M.A. (Mohammed Alshammari). All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU-FFR-2023-0116“.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to extend their sincere appreciation to Zarqa University and Northern Border University for supporting this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, M.-M.; Ma, Y.; Du, L.-T.; Wang, K.; Li, Z.; Zhu, W.; Sun, Y.-H.; Lu, L.; Bao, Y.-P.; Li, S.-X. Sleep disorders and non-sleep circadian disorders predict depression: A systematic review and meta-analysis of longitudinal studies. Neurosci. Biobehav. Rev. 2022, 134, 104532. [Google Scholar] [CrossRef]
  2. Greenlund, I.M.; Carter, J.R. Sympathetic neural responses to sleep disorders and insufficiencies. Am. J. Physiol. -Heart Circ. Physiol. 2022, 322, H337–H349. [Google Scholar] [CrossRef]
  3. Hu, X.; Li, J.; Wang, X.; Liu, H.; Wang, T.; Lin, Z.; Xiong, N. Neuroprotective Effect of Melatonin on Sleep Disorders Associated with Parkinson's Disease. Antioxidants 2023, 12, 396. [Google Scholar] [CrossRef] [PubMed]
  4. Sheta, A.; Thaher, T.; Surani, S.R.; Turabieh, H.; Braik, M.; Too, J.; Abu-El-Rub, N.; Mafarjah, M.; Chantar, H.; Subramanian, S. Diagnosis of Obstructive Sleep Apnea Using Feature Selection, Classification Methods, and Data Grouping Based Age, Sex, and Race. Diagnostics 2023, 13, 2417. [Google Scholar] [CrossRef] [PubMed]
  5. Controne, I.; Scoditti, E.; Buja, A.; Pacifico, A.; Kridin, K.; Del Fabbro, M.; Garbarino, S.; Damiani, G. Do Sleep Disorders and Western Diet Influence Psoriasis? A Scoping Review. Nutrients 2022, 14, 4324. [Google Scholar] [CrossRef] [PubMed]
  6. Alzyoud, M., Alazaidah, R., Aljaidi, M., Samara, G., Qasem, M., Khalid, M. and Al-Shanableh, N., 2024. Diagnosing diabetes mellitus using machine learning techniques. International Journal of Data and Network Science, 8(1), pp.179-188. [CrossRef]
  7. Aiyer, I.; Shaik, L.; Sheta, A.; Surani, S. Review of Application of Machine Learning as a Screening Tool for Diagnosis of Obstructive Sleep Apnea. Medicina 2022, 58, 1574. [Google Scholar] [CrossRef]
  8. Sheta, A.; Turabieh, H.; Thaher, T.; Too, J.; Mafarja, M.; Hossain, S.; Surani, S.R. Diagnosis of obstructive sleep apnea from ECG signals using machine learning and deep learning classifiers. Appl. Sci. 2021, 11, 6622. [Google Scholar] [CrossRef]
  9. Alazaidah, R.; Samara, G.; Almatarneh, S.; Hassan, M.; Aljaidi, M.; Mansur, H. Multi-Label Classification Based on Associations. Appl. Sci. 2023, 13, 5081. [Google Scholar] [CrossRef]
  10. Alazaidah, R.; Almaiah, M.A. Associative classification in multi-label classification: An investigative study. Jordanian J. Comput. Inf. Technol. 2021, 7., pp. 166-179. [CrossRef]
  11. Alazaidah, R.; Ahmad, F.K.; Mohsin, M. Multi label ranking based on positive pairwise correlations among labels. Int. Arab. J. Inf. Technol. 2020, 17, 440–449. [Google Scholar] [CrossRef]
  12. Haj Qasem, M.; Aljaidi, M.; Samara, G.; Alazaidah, R.; Alsarhan, A.; Alshammari, M. An Intelligent Decision Support System Based on Multi Agent Systems for Business Classification Problem. Sustainability 2023, 15, 10977. [Google Scholar] [CrossRef]
  13. Al-Batah, M.S.; Alzyoud, M.; Alazaidah, R.; Toubat, M.; Alzoubi, H.; Olaiyat, A. Early Prediction of Cervical Cancer Using Machine Learning Techniques. Jordanian J. Comput. Inf. Technol. 2022, 8., pp. 357-369. [CrossRef]
  14. Junoh, A.K.; AlZoubi, W.A.; Alazaidah, R.; Al-luwaici, W. New features selection method for multi-label classification based on the positive dependencies among labels. Solid State Technol. 2020, 63. [Google Scholar]
  15. Alluwaici, M.A.; Junoh, A.K.; Alazaidah, R. New problem transformation method based on the local positive pairwise dependencies among labels. J. Inf. Knowl. Manag. 2020, 19, 2040017. [Google Scholar] [CrossRef]
  16. Junoh, A.K.; Ahmad, F.K.; Mohsen, M.F.M.; Alazaidah, R. Open research directions for multi label learning. In Proceedings of the 2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), Penang, Malaysia, 28-29 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 125–128. [Google Scholar]
  17. Alazaidah, R.; Ahmad, F.K.; Mohsen, M.F.M. A comparative analysis between the three main approaches that are being used to solve the problem of multi label classification. Int. J. Soft Comput. 2017, 12, 218–223. [Google Scholar]
  18. Alazaidah, R.; Ahmad, F.K.; Mohsen, M.F.M.; Junoh, A.K. Evaluating conditional and unconditional correlations capturing strategies in multi label classification. J. Telecommun. Electron. Comput. Eng. (JTEC) 2018, 10, 47–51. [Google Scholar]
  19. AlShourbaji, I.; Samara, G.; abu Munshar, H.; Zogaan, W.A.; Reegu, F.A.; Aliero, M.S. Early detection of skin cancer using deep learning approach. Elem. Educ. Online 2021, 20, 3880–3884. [Google Scholar]
  20. Sobri, M.Z.A.; Redhwan, A.; Ameen, F.; Lim, J.W.; Liew, C.S.; Mong, G.R.; Daud, H.; Sokkalingam, R.; Ho, C.-D.; Usman, A.; et al. A review unveiling various machine learning algorithms adopted for biohydrogen productions from microalgae. Fermentation 2023, 9, 243. [Google Scholar] [CrossRef]
  21. Pentoś, K.; Mbah, J.T.; Pieczarka, K.; Niedbała, G.; Wojciechowski, T. Evaluation of multiple linear regression and machine learning approaches to predict soil compaction and shear stress based on electrical parameters. Appl. Sci. 2022, 12, 8791. [Google Scholar] [CrossRef]
  22. Mora-Garcia, R.T.; Cespedes-Lopez, M.F.; Perez-Sanchez, V.R. Housing Price Prediction Using Machine Learning Algorithms in COVID-19 Times. Land 2022, 11, 2100. [Google Scholar] [CrossRef]
  23. Ammer, M.A.; Aldhyani, T.H. Deep learning algorithm to predict cryptocurrency fluctuation prices: Increasing investment awareness. Electronics 2022, 11, 2349. [Google Scholar] [CrossRef]
  24. Oyeleye, M.; Chen, T.; Titarenko, S.; Antoniou, G. A predictive analysis of heart rates using machine learning techniques. Int. J. Environ. Res. Public Health 2022, 19, 2417. [Google Scholar] [CrossRef]
  25. Al-Buraihy, E.; Dan, W.; Khan, R.U.; Ullah, M. An ML-Based Classification Scheme for Analyzing the Social Network Reviews of Yemeni People. Int. Arab. J. Inf. Technol. 2022, 19, 904–914. [Google Scholar] [CrossRef]
  26. Al Khaldy, M., Alauthman, M., Al-Sanea, M.S. and Samara, G., 2020, Improve Class Prediction By Balancing Class Distribution For Diabetes Dataset. International Journal of Scientific & Technology Research Volume 9(4).
  27. Hu, X.; Li, J.; Wang, X.; Liu, H.; Wang, T.; Lin, Z.; Xiong, N. Neuroprotective Effect of Melatonin on Sleep Disorders Associated with Parkinson's Disease. Antioxidants 2023, 12, 396. [Google Scholar] [CrossRef] [PubMed]
  28. Anbarasi, L.J.; Jawahar, M.; Ravi, V.; Cherian, S.M.; Shreenidhi, S.; Sharen, H. Machine learning approach for anxiety and sleep disorders analysis during COVID-19 lockdown. Health Technol. 2022, 12, 825–838. [Google Scholar] [CrossRef] [PubMed]
  29. Bitkina, O.V.; Park, J.; Kim, J. Modeling sleep quality depending on objective actigraphic indicators based on machine learning methods. Int. J. Environ. Res. Public Health 2022, 19, 9890. [Google Scholar] [CrossRef]
  30. Crivello, A.; Palumbo, F.; Barsocchi, P.; La Rosa, D.; Scarselli, F.; Bianchini, M. Understanding human sleep behaviour by machine learning. In Cognitive Infocommunications, Theory and Applications; Springer: Cham, Switzerland, 2019; pp. 227–252. [Google Scholar] [CrossRef]
  31. Satapathy, S.; Loganathan, D.; Kondaveeti, H.K.; Rath, R. Performance analysis of machine learning algorithms on automated sleep staging feature sets. CAAI Trans. Intell. Technol. 2021, 6, 155–174. [Google Scholar] [CrossRef]
  32. Kazimipour, B., Boostani, R., Borhani-Haghighi, A., Almatarneh, S. and Aljaidi, M., 2022, November. EEG-Based Discrimination Between Patients with MCI and Alzheimer. In 2022 International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI) (pp. 1-5). IEEE. [CrossRef]
Figure 1. Main phases in research methodology.
Figure 1. Main phases in research methodology.
Preprints 93216 g001
Figure 2. Correlation matrix for Dataset 1.
Figure 2. Correlation matrix for Dataset 1.
Preprints 93216 g002
Figure 3. Correlation matrix for Dataset 2.
Figure 3. Correlation matrix for Dataset 2.
Preprints 93216 g003
Figure 4. Tree constructed for Dataset 1 when using RandomTree classifier.
Figure 4. Tree constructed for Dataset 1 when using RandomTree classifier.
Preprints 93216 g004
Table 1. Datasets characteristics.
Table 1. Datasets characteristics.
Name Dataset 1 Dataset 2
Type Classification, Regression Classification, Regression
Instances 62 62
Features 11 8
Missing Values Yes Yes
Table 2. Features and main characteristics.
Table 2. Features and main characteristics.
No. Name Type Minimum Maximum
1 Species Nominal - -
2 Body weight (kg) Real 0.005 6654
3 Brain weight (g) Real 0.14 5712
4 Slow wave (h/day) Real 2.1 17.9
5 Paradoxical (h/day) Real 0 6.6
6 Total sleep (h/day) Real 2.6 19.9
7 Maximum life span (years) Real 2 100
8 Gestation time (days) Real 12 645
9 Predation index (1–5) Integer 1 5
10 Exposure index (1–5) Integer 1 5
11 Overall danger index (1–5) Integer 1 5
Table 3. CC Results using twenty three regression models on both datasets.
Table 3. CC Results using twenty three regression models on both datasets.
Strategy Model Dataset 1 Dataset 2
CC CC
Functions GaussianProcesses 0.929 0.605
LinearRegression 0.570 0.576
MultilayerPerceptron 0.954 0.699
SMOreg 0.950 0.684
Lazy IBK 0.911 0.634
KStar 0.818 0.679
LWL 0.811 0.628
Meta AdditiveRegression 0.845 0.494
Bagging −0.249 0.655
RandomCommittee 0.837 0.639
RandomizableFilteredClassifier 0.348 0.602
RandomSubSpace 0.859 0.609
RegressionByDiscretization 0.933 0.538
Stacking −0.287 −0.497
Vote −0.287 −0.497
Rules DecisionTable 0.859 0.527
M5Rules 0.000 0.587
ZeroR −0.287 −0.497
Trees DecisionStump 0.737 0.485
M5P 0.000 0.588
RandomForest 0.903 0.608
RandomTree 0.759 0.453
REPTree −0.287 0.416
Table 4. MAE results using twenty three regression models on both datasets.
Table 4. MAE results using twenty three regression models on both datasets.
Strategy Model Dataset 1 Dataset 2
MAE MAE
Functions GaussianProcesses 0.445 3.089
LinearRegression 1.111 2.975
MultilayerPerceptron 0.328 3.128
SMOreg 0.351 2.747
Lazy IBK 0.355 2.847
KStar 0.596 2.752
LWL 0.694 2.793
Meta AdditiveRegression 0.568 3.401
Bagging 1.290 2.804
RandomCommittee 0.867 2.923
RandomizableFilteredClassifier 1.177 3.390
RandomSubSpace 0.918 2.969
RegressionByDiscretization 0.293 3.314
Stacking 1.277 3.741
Vote 1.277 3.741
Rules DecisionTable 0.540 3.022
M5Rules 1.290 2.886
ZeroR 1.277 3.741
Trees DecisionStump 0.811 3.306
M5P 1.290 2.885
RandomForest 0.816 2.953
RandomTree 0.730 3.958
REPTree 1.277 3.415
Table 5. Root mean squared error coefficient results using twenty three regression models on both datasets.
Table 5. Root mean squared error coefficient results using twenty three regression models on both datasets.
Strategy Model Dataset 1 Dataset 2
RMSE RMSE
Functions GaussianProcesses 0.561 3.855
LinearRegression 1.276 4.039
MultilayerPerceptron 0.435 3.891
SMOreg 0.451 3.619
Lazy IBK 0.596 3.335
KStar 0.834 3.478
LWL 0.848 3.656
Meta AdditiveRegression 0.806 4.430
Bagging 1.455 3.761
RandomCommittee 1.014 3.562
RandomizableFilteredClassifier 1.571 4.415
RandomSubSpace 1.040 3.629
RegressionByDiscretization 0.530 3.968
Stacking 1.443 4.696
Vote 1.443 4.696
Rules DecisionTable 0.739 3.923
M5Rules 1.481 3.850
ZeroR 1.443 4.696
Trees DecisionStump 0.991 4.076
M5P 1.481 3.847
RandomForest 0.949 3.652
RandomTree 0.940 5.068
REPTree 1.443 4.299
Table 6. Relative absolute error results using twenty three regression models on both datasets.
Table 6. Relative absolute error results using twenty three regression models on both datasets.
Strategy Model Dataset 1 Dataset 2
RAE RAE
Functions GaussianProcesses 34.838 82.556
LinearRegression 86.974 79.514
MultilayerPerceptron 25.648 83.607
SMOreg 27.455 73.415
Lazy IBK 27.779 76.082
KStar 46.653 73.560
LWL 54.301 74.643
Meta AdditiveRegression 44.449 90.902
Bagging 100.948 74.951
RandomCommittee 67.888 78.120
RandomizableFilteredClassifier 92.176 90.598
RandomSubSpace 71.827 79.364
RegressionByDiscretization 22.921 88.588
Stacking 100.000 100.000
Vote 100.000 100.000
Rules DecisionTable 42.295 80.772
M5Rules 101.014 77.135
ZeroR 100.000 100.000
Trees DecisionStump 63.493 88.351
M5P 101.014 77.097
RandomForest 63.900 78.928
RandomTree 57.124 105.800
REPTree 100.000 91.268
Table 7. Root relative squared error results using twenty three regression models on both datasets.
Table 7. Root relative squared error results using twenty three regression models on both datasets.
Strategy Model Dataset 1 Dataset 2
RRSE RRSE
Functions GaussianProcesses 38.906 82.097
LinearRegression 88.471 86.005
MultilayerPerceptron 30.147 82.860
SMOreg 31.261 77.065
Lazy IBK 41.287 84.413
KStar 57.792 74.074
LWL 58.755 77.849
Meta AdditiveRegression 55.858 94.345
Bagging 100.865 73.712
RandomCommittee 70.285 78.853
RandomizableFilteredClassifier 108.880 94.027
RandomSubSpace 72.113 77.283
RegressionByDiscretization 36.720 84.498
Stacking 100.000 100.000
Vote 100.000 100.000
Rules DecisionTable 51.193 83.544
M5Rules 102.653 81.980
ZeroR 100.000 100.000
Trees DecisionStump 68.667 86.801
M5P 102.653 81.929
RandomForest 65.769 77.767
RandomTree 65.167 107.926
REPTree 100.000 91.541
Table 8. Recapitulation table to identify the best regression model with respect to the considered evaluation metric.
Table 8. Recapitulation table to identify the best regression model with respect to the considered evaluation metric.
CC MAE RMSE RAE RRSE
Dataset 1 MLP MLP MLP MLP MLP
SMOreg RBD SMOreg SMOreg SMOreg
Dataset 2 MLP SMOreg IBK SMOreg SMOreg
SMOreg KStar KStar KStar KStar
Table 9. Accuracy and precision results using twenty nine classification models on both datasets.
Table 9. Accuracy and precision results using twenty nine classification models on both datasets.
Strategy Classifier Dataset 1 Dataset 2
Accuracy Precision Accuracy Precision
Bayes BayesNet 71.642 0.742 67.000 0.671
NaiveBayes 80.597 0.833 59.000 0.590
NaiveBayesUpdateable 80.597 0.833 59.000 0.590
Functions Logistic 80.597 0.836 83.000 0.833
MultilayerPerceptron 91.045 0.914 66.000 0.667
SimpleLogistic 91.045 0.915 67.000 0.670
SMO 92.537 0.928 66.300 0.660
Lazy IBK 94.030 0.940 86.000 0.867
KStar 86.567 0.874 84.000 0.842
LWL 68.657 0.667 36.000 0.294
Meta Bagging 74.627 0.778 65.000 0.615
ClassificationViaRegression 79.105 0.836 57.000 0.571
FilteredClassifier 76.119 0.788 65.000 0.579
LogitBoost 91.045 0.914 82.000 0.800
MultiClassClassifier 85.075 0.869 69.000 0.667
RandomCommittee 91.045 0.916 83.000 0.842
RandomizableFilteredClassifier 85.075 0.853 86.000 0.859
RandomSubSpace 88.060 0.893 66.000 0.671
Vote 14.925 0.143 16.000 0.160
Rules DecisionTable 79.105 0.863 50.000 0.583
JRip 80.597 0.826 58.000 0.667
OneR 76.119 0.927 38.000 0.458
PART 88.060 0.884 66.000 0.700
ZeroR 14.925 0.143 16.000 0.160
Trees J48 88.060 0.890 63.000 0.625
LMT 91.045 0.915 85.000 0.800
RandomForest 94.030 0.940 84.000 0.833
RandomTree 80.597 0.819 82.000 0.833
REPTree 74.627 0.788 60.000 0.571
Table 10. Recall and F1-measure results using twenty nine classification models on both datasets.
Table 10. Recall and F1-measure results using twenty nine classification models on both datasets.
Strategy Classifier Dataset 1 Dataset 2
Recall F1-Measure Recall F1-Measure
Bayes BayesNet 0.716 0.720 0.670 0.667
NaiveBayes 0.806 0.806 0.590 0.706
NaiveBayesUpdateable 0.806 0.806 0.590 0.706
Functions Logistic 0.806 0.804 0.830 0.870
MultilayerPerceptron 0.910 0.910 0.660 0.737
SimpleLogistic 0.910 0.910 0.670 0.667
SMO 0.925 0.925 0.680 0.632
Lazy IBK 0.940 0.947 0.860 0.876
KStar 0.866 0.864 0.840 0.869
LWL 0.687 0.671 0.307 0.296
Meta Bagging 0.746 0.750 0.650 0.667
ClassificationViaRegression 0.791 0.796 0.570 0.591
FilteredClassifier 0.761 0.764 0.650 0.629
LogitBoost 0.910 0.912 0.820 0.849
MultiClassClassifier 0.851 0.850 0.688 0.697
RandomCommittee 0.910 0.911 0.830 0.859
RandomizableFilteredClassifier 0.851 0.850 0.860 0.870
RandomSubSpace 0.881 0.881 0.660 0.667
Vote 0.149 0.144 0.160 0.164
Rules DecisionTable 0.791 0.809 0.500 0.533
JRip 0.806 0.806 0.580 0.600
OneR 0.761 0.805 0.380 0.500
PART 0.881 0.880 0.700 0.765
ZeroR 0.149 0.146 0.160 0.216
Trees J48 0.881 0.881 0.630 0.625
LMT 0.910 0.910 0.850 0.842
RandomForest 0.940 0.941 0.840 0.889
RandomTree 0.806 0.808 0.820 0.859
REPTree 0.746 0.753 0.600 0.667
Table 11. MCC results using twenty nine classification models on both datasets.
Table 11. MCC results using twenty nine classification models on both datasets.
Strategy Classifier Dataset 1 Dataset 2
MCC MCC
Bayes BayesNet 0.651 0.664
NaiveBayes 0.763 0.719
NaiveBayesUpdateable 0.763 0.719
Functions Logistic 0.766 0.861
MultilayerPerceptron 0.887 0.721
SimpleLogistic 0.887 0.700
SMO 0.904 0.687
Lazy IBK 0.930 0.862
KStar 0.831 0.859
LWL 0.677 0.293
Meta Bagging 0.686 0.634
ClassificationViaRegression 0.749 0.634
FilteredClassifier 0.710 0.645
LogitBoost 0.887 0.853
MultiClassClassifier 0.819 0.704
RandomCommittee 0.888 0.853
RandomizableFilteredClassifier 0.810 0.862
RandomSubSpace 0.854 0.693
Vote −0.143 0.176
Rules DecisionTable 0.778 0.595
JRip 0.757 0.667
OneR 0.787 0.457
PART 0.849 0.808
ZeroR −0.143 0.256
Trees J48 0.852 0.692
LMT 0.887 0.839
RandomForest 0.926 0.862
RandomTree 0.757 0.861
REPTree 0.691 0.612
Table 12. Recapitulation table to identify the best classification model.
Table 12. Recapitulation table to identify the best classification model.
Accuracy Precision Recall F1-Measure MCC
Dataset 1 IBK IBK IBK IBK IBK
RF RF RF RF RF
Dataset 2 IBK IBK IBK RF IBK
RFC RFC RFC IBK RF
Table 13. Best learning strategy results.
Table 13. Best learning strategy results.
Task Bayes Function Lazy Meta Rules Trees
Regression CC 0.851 0.847 0.375 0.191 0.422
CC 0.641 0.647 0.318 0.206 0.510
MAE 0.559 0.548 0.744 1.036 0.985
MAE 2.985 2.797 3.082 3.216 3.303
RMSE 0.681 0.759 1.163 1.221 1.161
RMSA 3.851 3.490 4.145 4.156 4.188
RAE 43.729 42.911 75.026 81.103 77.106
RAE 79.773 74.762 87.815 85.969 88.289
RRSE 47.196 52.611 80.590 84.615 80.451
RRSE 82.007 78.779 87.840 88.508 89.193
Classification Accuracy 77.612 88.806 83.085 76.120 67.761 85.672
Accuracy 61.667 70.575 68.667 65.444 45.600 74.800
Precision 0.803 0.898 0.827 0.777 0.729 0.870
Precision 0.617 0.708 0.668 0.640 0.514 0.732
Recall 0.776 0.888 0.831 0.761 0.678 0.857
Recall 0.617 0.710 0.669 0.654 0.464 0.748
F1-measure 0.777 0.887 0.827 0.762 0.689 0.859
F1-measure 0.693 0.727 0.680 0.666 0.523 0.776
MCC 0.726 0.861 0.843 0.696 0.606 0.823
MCC 0.701 0.742 0.716 0.673 0.557 0.773
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated