Preprint
Article

Explainable Machine Learning Techniques to Predict Muscle Injuries in Professional Soccer Players from Biomechanical Analysis

Submitted:

30 October 2023

Posted:

31 October 2023

You are already at the latest version

A peer-reviewed article of this preprint also exists.

Abstract
There is a significant risk of injury in sports and intense competition due to the demanding physical and psychological requirements. Hamstring strain injuries (HSI) are the most prevalent type of injury among professional soccer players and are the leading cause of missed days in the sport. These injuries stem from a combination of factors, making it challenging to pinpoint the most crucial risk factors and their interactions, let alone find effective prevention strategies. Recently, there has been a growing recognition of the potential of tools provided by artificial intelligence (AI). However, current studies primarily concentrate on enhancing the performance of complex machine learning models, often overlooking their explanatory capabilities. Consequently, medical teams encounter difficulty interpreting these models and are hesitant to trust them fully. In light of this, there is an increasing need for advanced injury detection and prediction models that can aid doctors in diagnosing or detecting injuries earlier and with greater accuracy. Accordingly, this study aims to identify biomarkers of muscle injuries in professional soccer players through a biomechanical analysis, employing several ML algorithms, such as Decision tree (DT) methods, Discriminant methods, Logistic regression, Naive Bayes, Support vector machine (SVM), K-nearest neighbor (KNN), Ensemble methods, Boosted and bagged trees, Artificial Neural Networks (ANN), and XGBoost. In particular, XGBoost was also used to obtain the most important features. The findings highlight that the variables that most effectively differentiate the groups and could serve as reliable predictors for injury prevention are the maximum muscle strength of the hamstrings and the stiffness of the same muscle. With regards to the 35 techniques employed, a precision of up to 78% was achieved with XGBoost, indicating that by considering scientific evidence, suggestions based on various data sources, and expert opinions, it is possible to attain good precision, thus enhancing the reliability of the results for doctors and trainers. Furthermore, the obtained results strongly align with the existing literature, although further specific studies about this sport are necessary to draw a definitive conclusion.
Keywords: 
Subject: 
Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning

1. Introduction

In sports and intense competition, the substantial risk of sustaining injuries arises from the demanding physical and psychological pressures. These injuries impact the athletes and have ripple effects on coaches, sponsors, teams, and clubs, compounded by the substantial medical expenses involved [1,2,3,4]. In sports like soccer, lower extremity injuries claim a significant share, amounting to 92% of the total injury count [5,6,7,8]. The sport’s physical demands and specific characteristics contribute significantly to this heightened injury incidence [5,9]. Notably, hamstring tears (HSI) represent the most prevalent injury in football, making up 12% of all reported injuries. Alarmingly, the recurrence rate for this injury ranges between 12% and 41% within the first year of returning to the sport [1,2]. Despite this, a recent meta-analysis revealed that seemingly, high-level soccer teams do not implement any injury prevention protocols [10].
Considering the multi-factorial nature of injuries, identifying the most crucial risk factors and their interplay and devising effective prevention strategies present considerable challenges [11,12]. In response, attention has turned towards the potential of artificial intelligence tools [13]. Leveraging expansive datasets and predictive models, healthcare professionals can diagnose, predict, and treat their patients with heightened confidence [14]. Examples include forecasting post-cardiac surgery complications [15], predicting ICU mortality due to COVID-19 [16], anticipating outcomes following knee surgery [17], diagnosing pathologies in lumbar spine MRIs [18], and foreseeing surgical risks [19], among others. However, many machine learning (ML) models often lack user-friendliness for individuals interacting with them. Achieving an understanding, termed explainability or interpretability, is critical for human users to comprehend and trust the machine’s decision-making process [20]. While the development of explainable models is relatively recent, their value in interpreting ML models has been acknowledged by various experts [21,22]. Notably, decision trees emerge as prominent tools in healthcare, as many health professionals are already familiar with them in their practice, considering that clinical, serological, or radiological data often underpin similar medical decisions [14]. With this in mind, this study aims to determine biomarkers of hamstring injuries in professional soccer players through biomechanical analysis using machine learning techniques, emphasizing a certain level of explainability. To achieve this, the study will implement 35 machine-learning classification algorithms, explicitly focusing on applying the XGboost (Extra Gradient boosting) technique. XGboost represents an implementation of decision trees with Gradient boosting, offering the added advantage of automatically providing estimates of the feature importance within a trained predictive model [23].

2. State of the Art

Thigh muscles are commonly injured in soccer players due to the movement pattern, performing rapid accelerations rapid accelerations and decelerations, causing the muscles to overstretch [11]. Risk factors for hamstring injuries are a matter of debate and many studies have been conducted investigating possible predictors, a systematic review evaluating high-quality prospective studies in soccer players recognized previous injury as the only significant risk factor [11,24]. Furthermore, the authors concluded that, among others, body mass index (BMI), height, weight, and player exposure to player exposure were likely insignificant factors. Furthermore, there is evidence supporting age as a possible risk factor for hamstring injury in soccer players [5,25,26]. There is conflicting evidence on the influence of muscle strength, although a high-quality prospective study found hamstring and quadriceps strength deficits as weak risk factors for hamstring injury [24] . However, the authors of this study doubt its clinical relevance and do not recommend isokinetic strength to identify patients at risk [27]. Quadriceps peak torque was considered a risk factor in a recent systematic review and meta-analysis. This marker was found in four soccer-related studies. Consequently, quadriceps peak torque may be considered a predictive factor in soccer, although more soccer-specific studies are needed for a conclusive statement [28]. A large number of potential predictors of hamstring injuries have been investigated, but there is currently insufficient evidence to draw conclusions. The most important factors are: age [11,28,29,30], Previous injury [11,28,29,31,32], increased quadriceps torque [11,28,30], asymmetry of eccentric hamstring strength [11,28], lower body stiffness [11,33,34], and single leg bridge test [11,28]. Furthermore, according to the book Return to Play in Football: An Evidence-based Approach [11] it can be observed that the psychological component or position in the game are also relevant factors.
Regarding ML techniques, the studies found in the literature that talk about explainable machine learning are few, from the studies found it was determined that the most popular forms are classifiers [35,36,37], post explanatory ML technique [38,39,40,41,42] and feature selection [40,43]. For studies using feature selection, it was observed that it improves the performance of prediction models and makes the results more interpretable. In this regard, one study proposed a method to identify the most important features for the assessment of joint space narrowing progression in patients with knee osteoarthritis [44]. Another study employed fuzzy logic to combine multiple feature importance scores, which were used for the identification and interpretation of knee osteoarthrosis risk factors, the presented methodology was able to select a subset of risk factors that increased the accuracy of the performance of several ML models, compared to popular selection techniques [43], indicating that feature selection is a good option when it is desired to provide explainability to the results.

3. Soccer player injury classification architecture

In this study, we present an injury classification architecture based on four distinct biomechanical measures derived from professional soccer players. The architecture is depicted in Figure 1 and comprises various stages, including the collection of the dataset for biomechanical testing, pre-processing, classification, and the final classification results. We will provide a detailed explanation of each stage of the proposed architecture in the following sections.

3.1. Dataset for Biomechanical tests

In this work, 110 male professional soccer players were evaluated to build the proposed dataset. For this, different evaluations were conducted by kinesiologists at the Biomechanics Laboratory of the Innovation Center, located within the MEDS Clinic in Santiago, Chile. Exclusion criteria encompassed injuries within the last three months and a body mass index (BMI) below 24. All participants provided informed consent before participation, and adherence to the exclusion and inclusion criteria was verified before data acquisition. Subsequently, various anthropometric measurements, including weight, height, and segment length, were obtained. Players were then instructed to perform specific warm-up exercises to activate their muscles. We will explain in detail each of the Biomechanical tests performed durting the data acquisition stage as follows.

3.1.1. Biomechanical tests

  • Eccentric Asymmetry force test (Nordic Hamstring): The participants assume a kneeling position with aligned hips and trunk support (see Figure 2a). An assistant or, in this case, load cells, is responsible for securing the heels, ensuring continual contact with the ground during the exercise. Load cells are utilized to measure the eccentric activation of the hamstring muscles. This test yields two parameters: Maximum right hamstring eccentric force (N) and Maximum left hamstring eccentric force (N), respectively.
  • Single leg bridge test: This clinical test assesses the susceptibility to hamstring injury. The participant is instructed to lie on the floor supine with the heel of the designated leg placed inside a 60 cm high box. With hands crossed over the chest, the subject must push with the heel to elevate the glutes off the ground. Each repetition requires the participant to touch the ground before raising the glutes again without resting (see Figure 2b). This test yields the Number of repetitions for the right leg and the Number of repetitions for the left leg, respectively.
  • Muscle stiffness measure (Myotonometry): This technique involves an objective and non-invasive digital palpation method for superficial skeletal muscles. The measurement targets explicitly the hamstring muscles (see Figure 2c) and is conducted using the MyotonPRO device. The parameters to be obtained for both extremities include S – Stiffness (N/m), which reflects the resistance to force or contraction that induces structural or tissue deformation.
  • Vertical jump test (Bosco test): This series of vertical jumps serves to evaluate various aspects, including morphophysiological characteristics (muscle fiber types), functional attributes (heights and mechanical jump powers), and neuromuscular features (utilization of elastic energy and myotatic reflex, fatigue resistance) of the lower limb extensor muscles, based on the attained jump heights and mechanical power in different types of vertical jumps. The Bosco test will employ three jumps on a force platform. The execution of these jumps can be observed in Figure 2d, encompassing data from the Countermovement Jump (both two-legged and one-legged), Squat jump (both two-legged and one-legged), and Abalakov (both bipodal and unipodal) jumps.

3.2. Pre-processing

In this study, a pre-processing stage was implemented to ensure the integrity and reliability of the data for the subsequent machine learning classification stage. Scaling and imputation techniques were applied to handle missing data and standardize the variables effectively. Scaling was employed to standardize the force measures in relation to the body weight of each player. This adjustment aimed to ensure a fair evaluation by mitigating the dominance of participant-specific body variations, thus promoting an unbiased analysis. Furthermore, to address the issue of missing data, zeros were used to fill in the gaps, given that not all players were available for certain tests due to various reasons, such as prior injuries. This approach was pivotal in preserving the data’s completeness and preventing potential biases during the analysis. Overall, these pre-processing methods significantly contributed to preparing the data for comprehensive analysis and interpretation.

3.3. Classification

Once the data is pre-processed, we used it for train several ML algorithms to evaluate their classification performance. For this, a feature matrix of dimensions 110 × 19 was assembled, with 110 rows representing participants and 19 columns signifying various biomechanical test results, anthropometric measurements, and positions within the team (forward, defender, goalkeeper, or midfielder). Each sample of the dataset was categorized into two classes. Class 0 represents no lower limb muscle injuries during the playing season and class 1 means lower limb muscle injuries during the playing season.
To find the best possible classifier that fits our dataset, a total of 35 machine learning (ML) techniques were implemented, including: Decision tree (DT) methods, Discriminant methods, Logistic regression, Naive Bayes, Support vector machine (SVM), K-nearest neighbor (KNN), Ensemble methods, Boosted and bagged trees, Artificial Neural Networks (ANN), and XGBoost. We present a brief description and the used configuration of each of the proposed ML models in Table 1 and Table 3 respectively.
In addition, the feature importance analysis obtained from the XGBoost model will be utilized to identify the most important and differentiating characteristics of the dataset. For this, multiple iterations will be conducted, considering the best-performing characteristics from N=30 iterations. The evaluation metrics, including cross-validation and confusion matrix, will be derived to validate the classification performace. These influential characteristics, deemed as injury biomarkers, will be the focus of the analysis of this study.

3.4. Most important features

To obtain the features that contribute most to class differentiation, several iterations will be performed and the features will be considered N iterations (N will be considered equal to 30). For this, the Feature Importance module of XGBoost was used.

3.5. Results

The testing results of applying 35 ML algorithms are shown in Table 1, where the configuration and description of each model are also shown. The accuracy values of each model can be visualized in Figure 3, where the best performance was obtained by the XGBoost technique reaching 78%, followed by the SVM, decision tree, KNN and logistic regression kernel techniques that obtained more than 70% accuracy.
Table 3 shows the most important characteristics obtained by XGBoost, where the variable with the greatest weight in the classification, or the one that was repeated the most during 30 iterations was the maximum left hamstring strength, followed by right biceps femoris stiffness and semitendinosus stiffness. In the following, Figure 4 shows an example of the feature importance graph of the best performance, where we see the same variables as in Table 3, indicating that the most important feature is the maximum hamstring force.

4. Discussion

Contemporary research primarily focuses on optimizing the functionality of intricate machine learning models, often neglecting their capacity for explanation. Consequently, healthcare professionals encounter challenges in comprehending these models and struggle to place trust in their outputs [42,45,46,47]. Thus, there is a growing demand for advanced ML detection and prediction models that can aid doctors in early and precise disease diagnosis [2,42,46]. Hence, both model performance and explainability are important in facilitating sound decision-making.
Studies demonstrate that in research aiming to provide interpretability to results, the most commonly employed ML techniques include Random Forests, Decision Trees, K-nearest neighbors (KNN), and Support Vector Machines. These simpler models are typically favored when the emphasis lies on generating more comprehensible and interpretable models [47]. Notably, the construction of these models is informed by scientific evidence, suggestions drawn from various data sources, and expert opinions [14]. Some of these models were implemented in this work, and their performance results are shown in the Figure 3. The best performance was obtained with the model Nº 35, corresponding to the XGboost model, achieving an accuracy of 78%, this result can be seen in another study that applies several ML techniques and also obtains that XGBoost has the best performance [48].
The Figure 4 highlights the most influential iteration, showcasing the Maximum Force of the left Hamstring as the variable with the highest weight. The most significant features contributing to the best results are detailed in Table 3. The findings suggest that the maximum muscle strength of the hamstrings and the stiffness of the same muscle are key variables distinguishing the groups and could serve as effective predictors for injury prevention. However, there is conflicting evidence concerning the influence of muscle strength. A high-quality prospective study indicated that hamstring and quadriceps strength deficits were weak risk factors for hamstring strain injuries (HSI), casting doubt on their clinical significance [25]. Conversely, a recent systematic review focused on strength training as the primary approach to prevention [5]. Nonetheless, to establish a definitive statement, further specific studies within this sport are required [5]. Regarding team differences, aside from the significance of quadriceps strength, as mentioned earlier, disparities in player age have also been identified. Evidence supports age as a potential risk factor for hamstring injuries in soccer players [5,25]. Hence, it becomes essential to assess whether differences in the number of injuries per team are attributable to this factor and to review the type of training programs each group employs, considering the potential impact of these differences on performance.

5. Conclusions

A notable disparity exists between academic research outcomes and their practical implementation in medical practice. Medical professionals hesitate to rely on decisions generated by opaque black box models lacking comprehensive and easily understandable explanations [49]. Consequently, ML techniques utilized in clinical settings typically avoid complex models in favor of simpler and more interpretable ones, albeit at the expense of precision or intricacy. In this context, applying the XGBoost technique instills confidence in the outcomes and offers a more interpretable perspective from a medical standpoint. The results from this technique indicate that favorable precision can be achieved by incorporating scientific evidence, suggestions grounded in diverse data sources, and expert opinions, thereby enhancing the trustworthiness of the results for doctors and trainers. Moreover, the obtained results strongly align with the existing literature, although additional specific studies within this sport remain imperative to establish a definitive statement.

Author Contributions

Conceptualization, Mailyn Calderón-Díaz, Rony Silvestre A., Roberto Yáñez, Matías Roby, Marvin Querales and Rodrigo Salas; Data curation, Mailyn Calderón-Díaz, Rony Silvestre A., Juan Vásconez, Roberto Yáñez, Matías Roby and Marvin Querales; Formal analysis, Mailyn Calderón-Díaz, Rony Silvestre A., Roberto Yáñez, Matías Roby, Marvin Querales and Rodrigo Salas; Funding acquisition, Mailyn Calderón-Díaz and Rodrigo Salas; Investigation, Mailyn Calderón-Díaz, Rony Silvestre A., Juan Vásconez, Roberto Yáñez, Matías Roby, Marvin Querales and Rodrigo Salas; Methodology, Mailyn Calderón-Díaz, Juan Vásconez and Rodrigo Salas; Resources, Mailyn Calderón-Díaz and Rodrigo Salas; Software, Mailyn Calderón-Díaz and Juan Vásconez; Supervision, Rodrigo Salas; Validation, Mailyn Calderón-Díaz and Juan Vásconez; Visualization, Mailyn Calderón-Díaz and Juan Vásconez; Writing – original draft, Mailyn Calderón-Díaz and Juan Vásconez; Writing – review & editing, Mailyn Calderón-Díaz, Juan Vásconez and Rodrigo Salas.

Acknowledgments

ANID funded this work – Millennium Science Initiative Program – ICN2021_004, ANID FONDECYT research grant number 1221938, and ANID - Subdirección de Capital Humano- 21221478. The authors also acknowledge the support provided by Faculty of Engineering, Universidad Andres Bello, Santiago, Chile.

References

  1. Baroni, B.M.; Ruas, C.V.; Ribeiro-Alvares, J.B.; Pinto, R.S. Hamstring-to-quadriceps torque ratios of professional male soccer players: A systematic review. The Journal of Strength & Conditioning Research 2020, 34, 281–293. [Google Scholar]
  2. Lee, G.; Nho, K.; Kang, B.; Sohn, K.A.; Kim, D. Predicting Alzheimer’s disease progression using multi-modal deep learning approach. Scientific reports 2019, 9, 1952. [Google Scholar] [CrossRef]
  3. Cumps, E.; Verhagen, E.; Annemans, L.; Meeusen, R. Injury rate and socioeconomic costs resulting from sports injuries in Flanders: data derived from sports insurance statistics 2003. British journal of sports medicine 2008, 42, 767–772. [Google Scholar] [CrossRef] [PubMed]
  4. Calderón-Díaz, M.; Ulloa-Jiménez, R.; Saavedra, C.; Salas, R. Wavelet-based semblance analysis to determine muscle synergy for different handstand postures of Chilean circus athletes. Computer methods in biomechanics and biomedical engineering 2021, 24, 1053–1063. [Google Scholar] [CrossRef]
  5. Rosado-Portillo, A.; Chamorro-Moriana, G.; Gonzalez-Medina, G.; Perez-Cabezas, V. Acute hamstring injury prevention programs in eleven-a-side football players based on physical exercises: systematic review. Journal of clinical medicine 2021, 10, 2029. [Google Scholar] [CrossRef]
  6. Grazioli, R.; Lopez, P.; Andersen, L.L.; Machado, C.L.F.; Pinto, M.D.; Cadore, E.L.; Pinto, R.S. Hamstring rate of torque development is more affected than maximal voluntary contraction after a professional soccer match. European Journal of Sport Science 2019, 19, 1336–1341. [Google Scholar] [CrossRef]
  7. Crema, M.D.; Guermazi, A.; Tol, J.L.; Niu, J.; Hamilton, B.; Roemer, F.W. Acute hamstring injury in football players: association between anatomical location and extent of injury—a large single-center MRI report. Journal of science and medicine in sport 2016, 19, 317–322. [Google Scholar] [CrossRef] [PubMed]
  8. Ekstrand, J.; Hägglund, M.; Kristenson, K.; Magnusson, H.; Waldén, M. Fewer ligament injuries but no preventive effect on muscle injuries and severe injuries: an 11-year follow-up of the UEFA Champions League injury study. British journal of sports medicine 2013, 47, 732–737. [Google Scholar] [CrossRef]
  9. Szymski, D.; Krutsch, V.; Achenbach, L.; Gerling, S.; Pfeifer, C.; Alt, V.; Krutsch, W.; Loose, O. Epidemiological analysis of injury occurrence and current prevention strategies on international amateur football level during the UEFA Regions Cup 2019. Archives of orthopaedic and trauma surgery, 2021; 1–10. [Google Scholar]
  10. Biz, C.; Nicoletti, P.; Baldin, G.; Bragazzi, N.L.; Crimì, A.; Ruggieri, P. Hamstring strain injury (HSI) prevention in professional and semi-professional football teams: a systematic review and meta-analysis. International journal of environmental research and public health 2021, 18, 8272. [Google Scholar] [CrossRef]
  11. Musahl, V.; Karlsson, J.; Krutsch, W.; Mandelbaum, B.R.; Espregueira-Mendes, J.; d’Hooghe, P.; others, *!!! REPLACE !!!*. Return to play in football: an evidence-based approach, Springer, 2018.
  12. Cos, F.; Cos, M.Á.; Buenaventura, L.; Pruna, R.; Ekstrand, J. Modelos de análisis para la prevención de lesiones en el deporte. Estudio epidemiológico de lesiones: el modelo Union of European Football Associations en el fútbol. Apunts. Medicina de l’Esport 2010, 45, 95–102. [Google Scholar] [CrossRef]
  13. Halilaj, E.; Rajagopal, A.; Fiterau, M.; Hicks, J.L.; Hastie, T.J.; Delp, S.L. Machine learning in human movement biomechanics: Best practices, common pitfalls, and new opportunities. Journal of biomechanics 2018, 81, 1–11. [Google Scholar] [CrossRef] [PubMed]
  14. Handelman, G.; Kok, H.; Chandra, R.; Razavi, A.; Lee, M.; Asadi, H. eD octor: machine learning and the future of medicine. Journal of internal medicine 2018, 284, 603–619. [Google Scholar] [CrossRef] [PubMed]
  15. Tseng, P.Y.; Chen, Y.T.; Wang, C.H.; Chiu, K.M.; Peng, Y.S.; Hsu, S.P.; Chen, K.L.; Yang, C.Y.; Lee, O.K.S. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Critical care 2020, 24, 1–13. [Google Scholar] [CrossRef]
  16. Pan, S.L.; Zhang, S. From fighting COVID-19 pandemic to tackling sustainable development goals: An opportunity for responsible information systems research. International journal of information management 2020, 55, 102196. [Google Scholar] [CrossRef]
  17. Ramkumar, P.N.; Karnuta, J.M.; Haeberle, H.S.; Owusu-Akyaw, K.A.; Warner, T.S.; Rodeo, S.A.; Nwachukwu, B.U.; Williams III, R.J. Association between preoperative mental health and clinically meaningful outcomes after osteochondral allograft for cartilage defects of the knee: a machine learning analysis. The American Journal of Sports Medicine 2021, 49, 948–957. [Google Scholar] [CrossRef] [PubMed]
  18. Jamaludin, A.; Lootus, M.; Kadir, T.; Zisserman, A.; Urban, J.; Battié, M.C.; Fairbank, J.; McCall, I. Automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist. European Spine Journal 2017, 26, 1374–1383. [Google Scholar] [CrossRef] [PubMed]
  19. Mak, W.K.; Bin Abd Razak, H.R.; Tan, H.C.A. Which Patients Require a Contralateral Total Knee Arthroplasty Within 5 Years of Index Surgery? The Journal of Knee Surgery 2019, 33, 1029–1033. [Google Scholar] [CrossRef]
  20. Masís, S. Interpretable Machine Learning with Python: Learn to build interpretable high-performance models with hands-on real-world examples; Packt Publishing Ltd, 2021.
  21. Chen, H.; Michalopoulos, G.; Subendran, S.; Yang, R.; Quinn, R.; Oliver, M.; Butt, Z.; Wong, A. Interpretability of ml models for health data-a case study, 2019.
  22. Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef] [PubMed]
  23. Brownlee, J. XGBoost With python: Gradient boosted trees with XGBoost and scikit-learn; Machine Learning Mastery, 2016.
  24. Van Beijsterveldt, A.; van de Port, I.G.; Vereijken, A.; Backx, F. Risk factors for hamstring injuries in male soccer players: a systematic review of prospective studies. Scandinavian journal of medicine & science in sports 2013, 23, 253–262. [Google Scholar]
  25. Hägglund, M.; Waldén, M.; Ekstrand, J. Previous injury as a risk factor for injury in elite football: a prospective study over two consecutive seasons. British journal of sports medicine 2006, 40, 767–772. [Google Scholar] [CrossRef]
  26. Arnason, A.; Sigurdsson, S.B.; Gudmundsson, A.; Holme, I.; Engebretsen, L.; Bahr, R. Risk factors for injuries in football. The American journal of sports medicine 2004, 32, 5–16. [Google Scholar] [CrossRef]
  27. Van Dyk, N.; Bahr, R.; Whiteley, R.; Tol, J.L.; Kumar, B.D.; Hamilton, B.; Farooq, A.; Witvrouw, E. Hamstring and quadriceps isokinetic strength deficits are weak risk factors for hamstring strain injuries: a 4-year cohort study. The American journal of sports medicine 2016, 44, 1789–1795. [Google Scholar] [CrossRef] [PubMed]
  28. Freckleton, G.; Pizzari, T. Risk factors for hamstring muscle strain injury in sport: a systematic review and meta-analysis. British journal of sports medicine 2013, 47, 351–358. [Google Scholar] [CrossRef] [PubMed]
  29. Fousekis, K.; Tsepis, E.; Poulmedis, P.; Athanasopoulos, S.; Vagenas, G. Intrinsic risk factors of non-contact quadriceps and hamstring strains in soccer: a prospective study of 100 professional players. British journal of sports medicine 2011, 45, 709–714. [Google Scholar] [CrossRef] [PubMed]
  30. Henderson, G.; Barnes, C.A.; Portas, M.D. Factors associated with increased propensity for hamstring injury in English Premier League soccer players. Journal of Science and Medicine in Sport 2010, 13, 397–402. [Google Scholar] [CrossRef] [PubMed]
  31. Fyfe, J.J.; Opar, D.A.; Williams, M.D.; Shield, A.J. The role of neuromuscular inhibition in hamstring strain injury recurrence. Journal of electromyography and kinesiology 2013, 23, 523–530. [Google Scholar] [CrossRef] [PubMed]
  32. Warren, P.; Gabbe, B.J.; Schneider-Kolsky, M.; Bennell, K.L. Clinical predictors of time to return to competition and of recurrence following hamstring strain in elite Australian footballers. British journal of sports medicine 2010, 44, 415–419. [Google Scholar] [CrossRef]
  33. Blackburn, J.T.; Norcross, M.F. The effects of isometric and isotonic training on hamstring stiffness and anterior cruciate ligament loading mechanisms. Journal of Electromyography and Kinesiology 2014, 24, 98–103. [Google Scholar] [CrossRef] [PubMed]
  34. Watsford, M.L.; Murphy, A.J.; McLachlan, K.A.; Bryant, A.L.; Cameron, M.L.; Crossley, K.M.; Makdissi, M. A prospective study of the relationship between lower body stiffness and hamstring injury in professional Australian rules footballers. The American journal of sports medicine 2010, 38, 2058–2064. [Google Scholar] [CrossRef]
  35. Amaral, J.L.; Sancho, A.G.; Faria, A.C.; Lopes, A.J.; Melo, P.L. Differential diagnosis of asthma and restrictive respiratory diseases by combining forced oscillation measurements, machine learning and neuro-fuzzy classifiers. Medical & Biological Engineering & Computing 2020, 58, 2455–2473. [Google Scholar]
  36. Song, X.; Gu, F.; Wang, X.; Ma, S.; Wang, L. Interpretable Recognition for Dementia Using Brain Images. Frontiers in Neuroscience 2021, 15, 748689. [Google Scholar] [CrossRef] [PubMed]
  37. Sabol, P.; Sinčák, P.; Hartono, P.; Kočan, P.; Benetinová, Z.; Blichárová, A.; Verbóová, L.; Štammová, E.; Sabolová-Fabianová, A.; Jašková, A. Explainable classifier for improving the accountability in decision-making for colorectal cancer diagnosis from histopathological images. Journal of biomedical informatics 2020, 109, 103523. [Google Scholar] [CrossRef] [PubMed]
  38. García-Pérez, P.; Lozano-Milo, E.; Landin, M.; Gallego, P.P. Machine Learning unmasked nutritional imbalances on the medicinal plant Bryophyllum sp. cultured in vitro. Frontiers in Plant Science 2020, 11, 576177. [Google Scholar] [CrossRef] [PubMed]
  39. Apostolopoulos, I.D.; Groumpos, P.P.; Apostolopoulos, D.J. Advanced fuzzy cognitive maps: state-space and rule-based methodology for coronary artery disease detection. Biomedical Physics & Engineering Express 2021, 7, 045007. [Google Scholar]
  40. Juang, C.F.; Wen, C.Y.; Chang, K.M.; Chen, Y.H.; Wu, M.F.; Huang, W.C. Explainable fuzzy neural network with easy-to-obtain physiological features for screening obstructive sleep apnea-hypopnea syndrome. Sleep Medicine 2021, 85, 280–290. [Google Scholar] [CrossRef] [PubMed]
  41. Ding, L.; Zhang, X.y.; Wu, D.y.; Liu, M.l. Application of an extreme learning machine network with particle swarm optimization in syndrome classification of primary liver cancer. Journal of Integrative Medicine 2021, 19, 395–407. [Google Scholar] [CrossRef]
  42. El-Sappagh, S.; Alonso, J.M.; Islam, S.R.; Sultan, A.M.; Kwak, K.S. A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer’s disease. Scientific reports 2021, 11, 2660. [Google Scholar] [CrossRef] [PubMed]
  43. Kokkotis, C.; Ntakolia, C.; Moustakidis, S.; Giakas, G.; Tsaopoulos, D. Explainable machine learning for knee osteoarthritis diagnosis based on a novel fuzzy feature selection methodology. Physical and Engineering Sciences in Medicine 2022, 45, 219–229. [Google Scholar] [CrossRef] [PubMed]
  44. Ntakolia, C.; Kokkotis, C.; Moustakidis, S.; Tsaopoulos, D. Identification of most important features based on a fuzzy ensemble technique: Evaluation on joint space narrowing progression in knee osteoarthritis patients. International Journal of Medical Informatics 2021, 156, 104614. [Google Scholar] [CrossRef]
  45. Burkart, N.; Huber, M.F. A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research 2021, 70, 245–317. [Google Scholar] [CrossRef]
  46. Bucholc, M.; Ding, X.; Wang, H.; Glass, D.H.; Wang, H.; Prasad, G.; Maguire, L.P.; Bjourson, A.J.; McClean, P.L.; Todd, S.; others., *!!! REPLACE !!!*. A practical computerized decision support system for predicting the severity of Alzheimer’s disease of an individual. Expert systems with applications 2019, 130, 157–171. [Google Scholar] [CrossRef] [PubMed]
  47. Das, D.; Ito, J.; Kadowaki, T.; Tsuda, K. An interpretable machine learning model for diagnosis of Alzheimer’s disease. PeerJ 2019, 7, e6543. [Google Scholar] [CrossRef] [PubMed]
  48. Khan, I.U.; Aslam, N.; AlShedayed, R.; AlFrayan, D.; AlEssa, R.; AlShuail, N.A.; Al Safwan, A. A proactive attack detection for heating, ventilation, and air conditioning (HVAC) system using explainable extreme gradient boosting model (XGBoost). Sensors 2022, 22, 9235. [Google Scholar] [CrossRef]
  49. Calderón-Díaz, M.; Serey-Castillo, L.J.; Vallejos-Cuevas, E.A.; Espinoza, A.; Salas, R.; Macías-Jiménez, M.A. Detection of variables for the diagnosis of overweight and obesity in young Chileans using machine learning techniques. Procedia Computer Science 2023, 220, 978–983. [Google Scholar] [CrossRef]
Figure 1. Proposed architecture for the soccer player injury classification based on muscle biomechanical analysis.
Figure 1. Proposed architecture for the soccer player injury classification based on muscle biomechanical analysis.
Preprints 89163 g001
Figure 2. Biomechanical test procedure.
Figure 2. Biomechanical test procedure.
Preprints 89163 g002
Figure 3. ML models testing accuracy comparison (with ten k-fold).
Figure 3. ML models testing accuracy comparison (with ten k-fold).
Preprints 89163 g003
Figure 4. Feature importance
Figure 4. Feature importance
Preprints 89163 g004
Table 1. ML models, configurations and description - part 1.
Table 1. ML models, configurations and description - part 1.
No. Model name Model configuration Model description
No.1 Tree 100 splitts
No.2 Tree 20 splitts A flowchart-like structure where an internal
node represents a feature,
the branch represents a decision rule,
and each leaf node represents the outcome.
No.3 Tree 4 splitts
No.4 Linear
discriminant
Full covariance
structure
A statistical technique for binary
and multiclass classification,
finding the linear combination of
features that best separates classes.
No.5 Quadratic
discriminant
Full covariance
structure
A method similar to linear
discriminant analysis,
but it assumes that the
features follow a Gaussian distribution
and estimates the covariance
between the classes.
No.6 Binary GLM
Logistic
Regression
Binomial distribution Logistic regression with
binary outcomes for estimating
the probability of a binary
outcome using a logistic function.
No.7 Efficient
Logistic
Regression
L2 regularization,
alpha = 0.001,
one-vs-one coding
A regression analysis similar to
binary logistic regression but implemented
efficiently to handle large datasets
or high-dimensional data.
No.8 Efficient
Linear SVM
L2 regularization,
alpha = 0.001,
one-vs-one coding
A supervised machine learning algorithm
used for classification and regression
analysis, finding a hyperplane
that best separates classes.
No.9 Gaussian
Naive Bayes
Gaussian distribution A probabilistic classifier assuming that
the presence of a particular feature
in a class is unrelated to the presence
of other features.
No.10 Kernel
Naive Bayes
Normal kernel,
data standarization
A version of the Naive Bayes classifier
that can handle non-linear
classification by using kernel methods,
transforming data into a higher-dimensions.
No.11 Linear SVM Linear kernel,
one-vs-one coding,
data standarization
A supervised machine learning
algorithm used for classification, finding a
hyperplane that best separates classes
in a linearly separable dataset.
No.12 Quadratic SVM Quadratic kernel,
one-vs-one coding,
data standarization
An extension of the SVM algorithm
that uses a quadratic kernel to handle
non-linearly separable data by mapping
it into a higher-dimensional space.
No.13 Cubic SVM Cuibic kernel,
one-vs-one coding,
data standarization
An extension of the SVM algorithm that
uses a cubic kernel to handle highly
non-linearly separable data by mapping
it into an even higher-dimensional space.
No.14 Fine
Gaussian SVM
Kernel scale = 1.6,
one-vs-one coding,
data standarization
An SVM with a fine Gaussian kernel,
suitable for datasets requiring high
precision and accuracy.
No.15 Medium
Gaussian SVM
kernel scale = 6.5,
one-vs-one coding,
data standarization
An SVM with a medium Gaussian kernel,
suitable for datasets with moderate
complexity and dimensionality.
No.16 Coarse
Gaussian SVM
Kernel scale = 26,
one-vs-one coding,
data standarization
An SVM with a coarse Gaussian kernel,
suitable for datasets with lower
complexity and dimensionality.
Table 2. ML models, configurations and description - part 2.
Table 2. ML models, configurations and description - part 2.
No. Model name Model configuration Model description
No.17 Fine KNN Number of neighbors = 1,
euclidean distance
A non-parametric classification algorithm
that classifies a data point based on
the majority vote of its neighbors,
with a fine-tuned distance metric.
No.18 Medium
KNN
Number of neighbors = 10,
euclidean distance
A non-parametric classification algorithm
that classifies a data point based on
the majority vote of its neighbors, with a
moderately adjusted distance metric.
No.19 Coarse KNN Number of neighbors = 100,
euclidean distance
A non-parametric classification algorithm
that classifies a data point based on
the majority vote of its neighbors, with a
roughly adjusted distance metric.
No.20 Cosine KNN Number of neighbors = 10,
euclidean distance
A variation of the K-Nearest Neighbors
algorithm that computes the cosine
similarity between data points to
measure their similarity.
No.21 Cubic KNN Number of neighbors = 10,
euclidean distance
A non-parametric classification algorithm
that classifies a data point based on
the majority vote of its neighbors,
with a cubic distance metric.
No.22 Weighted
KNN
Number of neighbors = 10,
euclidean distance
A variant of the K-Nearest Neighbors
algorithm that assigns weights to the
contributions of the neighbors
based on their distances.
No.23 Boosted
Trees
with
AdaBoost
ensemble
Decision tree learner,
maximum splits = 20,
learning rate=0.1
An ensemble learning method that
constructs a strong classifier by combining
multiple weak classifiers, such as
decision trees, using the AdaBoost algorithm.
No.24 Bagged trees
with bag
ensemble
Decision tree learner,
maximum splits = 109,
number of learners = 30
An ensemble learning technique that
combines multiple models, such as decision
trees, to improve classification
accuracy and stability.
No.25 Subspace
discriminant
ensemble
Discriminant learner,
number of learners = 30,
subspace dimension = 10
An ensemble approach that combines
multiple discriminant analysis models to
improve the classification
performance of the system.
No.26 Subspace
KNN
ensemble
Subspace ensemple method,
decision tree learner,
number of learners = 30,
learning rate = 0.1
An ensemble learning technique that
combines multiple K-Nearest Neighbors
models operating in different subspaces
to improve classification accuracy.
No.27 RUSBoosted
Trees
RUSBoost ensemple method,
decision tree learner,
number of learners = 30,
learning rate = 0.1
It is a variant of the AdaBoost algorithm
that incorporates random under-sampling
to address class imbalance, particularly in
binary classification problems.
No.28 Neural Network 1 layer - 10 neurons, 1k iterations A network of interconnected nodes
inspired by the structure of the human brain,
capable of learning complex
patterns and relationships in data.
No.29 Neural Network 1 layer - 25 neurons, 1k iterations
No.30 Neural Network 1 layer - 100 neurons, 1k iterations
No.31 Neural Network 2 layers - 10 neuron, 1k iterations
No.32 Neural Network 3 layers - 10 neurons, 1k iterations
No.33 SVM
Kernel
SVM learner,
lambda regularization = 0.01,
one-vs-one coding,
iteration limit = 1000
A variant of the SVM algorithm
that uses kernel methods to handle non-linear
data by transforming it into a
higher-dimensional space.
No.34 Logistic
regression
kernel
Logistic regression learner,
lambda regularization = 0.01,
one-vs-one coding,
A variant of logistic regression that
uses kernel methods to handle non-linear
data.
No.35 XGBoost learning rate = 0.3,
L2 regularization alpha = 0.001,
sampling method = uniform
An optimized gradient boosting
library designed for speed and performance,
effective for classification and regression .
Table 3. Most important features (most repeated over 30 iterations) obtained by XGBoost
Table 3. Most important features (most repeated over 30 iterations) obtained by XGBoost
Feature Number of repetitions
Maximum Force Hamstring Left 28
Stiffness Biceps Femoris Right 28
Stiffness Semitendinosus Right 24
Maximum Force Right Quadriceps 21
Eccentric Force of Hamstrings 17
Age 16
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Alerts
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2025 MDPI (Basel, Switzerland) unless otherwise stated