3.1. A description of Italian Machine Learning/Deep Learning research in the medical area at a glance
In this section, we give a general description of the whole Italian research community through the paper published and indexed by SCOPUS since 2018.
As described in
Section 1, the Italian community is one of the most productive players in the area with more than 2500 papers.
Figure 5 shows a continuous increasing trend in the last 5 years. It is quite interesting to point out that most papers are published in international journals (i.e., around 74%) and are open-access (i.e., around 59%). In such papers, 74 Italian institutions are involved: in this list, there are not only universities and research institutes but also hospitals. This fact shows that participation is wide-ranging and concerns actors in all aspects, i.e., both AI and medical ones, and shows a link between the research and academic groups with the local communities.
Analyzing the founding sponsor’s point of view, it is possible to see a wide spectrum of national and international founding sponsors.
Figure 6 shows the top 10 founding sponsors, which papers have acknowledged. From a numerical point of view, the European Union was the first sponsor. Notably, the voice “European Union” groups different types of grants, e.g., Horizon 2020, 7th Framework Programme for Research, European Research Council. Moreover, the financial sponsorship of the Italian government is very relevant through grants provided by two different ministries (i.e., Ministry of Education, University and Research, and Ministry of Health). The other founding sponsors confirm well-established participation in international projects funded by grants, particularly from U.S.A. and U.K. agencies. Notably, the Italian research community’s international involvement is very high and such fact is confirmed by the data concerning the nationality affiliation of co-authors, see
Figure 7.
3.2. Systematic analysis
On the basis of criteria and the “manual pruning” phase described in
Section 2.1, we have selected 224 papers. These papers have been analysed through the criteria described in
Section 2.3.
Figure 8 shows the source journal of 224 papers. The papers are distributed in 42 journals. In
Figure 8, the journals are ordered in an alphabetic way. In the period analyzed, the top 3 journals for the number of publications are IEEE Access, Scientific Reports, and Applied Science.
Let us point out that not all the 224 papers are analyzed with our methodology, since 10 of these (i.e., [
4,
5,
6,
7,
8,
9,
10,
11,
12,
13] ) proposed new ML/DL methodologies, metrics or approaches that may be strongly related to the medical area, but they are not focused on a specific disease or on a specific case study. However, we beliave that it is very important that the Italian community does not only provide a bridge between ML/DL area and the medical area, but also proposes solutions to general issues, which are arisen by the peculiarity of the medical field.
First, let us focus on the medical topics considered in the paper we have analysed.
Figure 9 shows the distribution of such medical topics and
Table 2 reports the paper classification by medical topic. Note that all the topics considered only by just one paper are included within the
other category.
Table 1.
Classification by approach.
Table 1.
Classification by approach.
Methodology |
Approach |
Reference |
Machine Learning |
Unsupervised
Supervised
Semi-Supervised
reinforcement Learning |
[14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]
[14,31,33,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,187,170]
[34,81,174,188,189]
[187,190,191] |
Deep Learning |
Unsupervised
Supervised
Semi-Supervised
reinforcement Learning |
[32,46,64,175,192,193]
[176,177,178,179,180,181,182,183,184,185,186,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224]
[36,37,43,50,84,101,108,116,189]
[117,124,132,136,140,142,144,152,159,193]
[26]
[23,199] |
The most faced is topic is represented by
SARS-CoV-2 (i.e., 11.2 %). This result is not strange, since we considered the pandemic period and most of the efforts of the scientific community were focused against SARS-CoV-2.
Cancer is also a very important topic (i.e., 16.77 %), with
breast cancer representing one most considered topics (i.e., 7%), along with lung cancer, prostate cancer, and colorectal cancer that are considered by more than one paper. Such results are in line with cancer incidence, that sees such cancers as the most common types of cancers for occurrence in Europe [
225]. Since different types of cancers are considered, we considered those appearing in just one article in the
other types of cancer category (i.e., 4.2 %). Another relevant topic is represented by
Parkinson’s disease (i.e., 3.7 %).
Then, let us analyse the dimension of the
type of data used.
Table 3 shows that majority of the papers (i.e., 82.2%, 176 papers) uses only one data type. However, it is quite interesting to note that 38 papers (i.e., 17.5%) use 2 or 3 different types of data dealing with the issue of managing data with different characteristics.
Table 2.
Classification by medical topic
Table 2.
Classification by medical topic
Topic |
Reference |
Alzheimer’s disease |
[41,42,43,174,191,226] |
Autism Spectrum Disorders |
[47,227] |
%midrule
Brain Tumors |
[50,188,195] |
Breast Cancer |
[15,16,17,18,19,52,53,54,55,56,57,175,196,197,198] |
Cardiovascular disease |
[60,61,62,176,201] |
Chronic Kidney Disease |
[67,68,202] |
Dementia |
[77,78,79,179] |
Diabetes |
[80,81,82,83,84,85,204] |
Exposure to extremely low frequency waves |
[20,21] |
Glioblastoma |
[91,206] |
Heart Failure |
[92,93,94,182,207] |
kidney disease |
[100,228] |
Lung Cancer |
[23,103,104,105,106] |
Melanoma |
[109,110] |
Multiple Sclerosis |
[24,113,211] |
Parkinson’s Disease |
[25,123,124,125,126,127,128,129] |
Prostate Cancer |
[130,131] |
Rectal cancer |
[134,135,184] |
SARS-CoV-2 |
[27,34,35,46,143,144,145,146,147,185,186,215,216] |
|
[148,149,150,151,152,153,154,155,187,193,217] |
Seasonal flu |
[28,156] |
Sepsis |
[157,158,159] |
Stroke |
[165,219] |
Varicella Zoster |
[169,224] |
Voice releated pathologies |
[172,173] |
Other types of cancer |
[29,31,58,59,66,69,160,199,214] |
Surgery related |
[63,99,121,132,161,162,192,220] |
M-health |
[48,71,76,107,164,180] |
Patient tele-Monitoring |
[26,33,112,210] |
Liver diseases |
[89,95,102] |
Orthopedic |
[90,122,133] |
Arterial Disease |
[44,45,73] |
Trauma |
[70,168] |
Other |
[14,36,37,38,39,40,49,51,64,65,177,194,200,203] |
|
[22,32,72,74,75,86,87,88,178,181,205,208] |
|
[96,97,98,101,108,111,114,115,116,117,189,190,209] |
|
[118,119,120,136,137,138,139,140,141,142,183,212,213,218] |
|
[30,163,166,167,170,171,221,222,223] |
Figure 10 shows the distribution of the type of data used in the paper we analyzed. The most used type of data is E.H.R. and the second one is Clinical images. Let us point out that the class Other reaches the value of 15.7%, that is higher than Biomedicine and Biosignal category. This fact can indicate that several medical topics and diseases involve the need of managing a lot of different types of data to assess and characterize their complex features.
For what concerns the
pre-processing category,
Table 4 shows that the majority of the papers (i.e., 128, about 60%) indicate that the authors applied at least one pre-processing technique.
Figure 11 shows the distribution of the principal pre-processing techniques. We can see that the
features selection and
features extraction are the most used techniques, covering about the 50% of cases.
Figure 12 shows the distribution of ML/DL methodologies adopted, and
Table 1 presents the papers categorization by methodology. As described in
Section 2.3, we cataloged the approaches used in 8 macro-categories. The majority of the papers (i.e., 171, about 80%) use one or more approaches belonging to only one of these categories, whereas the remaining papers (i.e., 43, about 20%) use two or more approaches belonging to two categories. It is quite interesting to note that the ML approaches (i.e., 72.8%) are more used than DL approaches, and the most used approach belongs to the ML supervised category (i.e., 62.5%). In general, supervised approaches, including both ML and DL, are largely adopted (i.e., 86.3%). These facts underline how ML approaches still represent the most used approaches in the medical field in Italy, probably due to the scarcity of data needed to train DL approaches, that are however widely applied for image-related data. Furthermore, most learning methodologies are supervised, and thus focused towards a specific outcome that is already present in the training data and that clearly determines the medical question that the model should address.
We also analyzed the evaluation methodology employed in the papers. Our findings revealed a significant heterogeneity among the techniques and statistical measures used, underlining the absence of a standardized approach within the research community for this aspect. However, we observed that the validation phase is predominantly (though not exclusively) performed using two main methods: a fixed split of the dataset into a training set and a test set, and k-fold cross-validation. The fixed split method was employed in 49 papers, accounting for approximately 22.9% . The most commonly used split values were 90%-10%, 80%-20%, and 70%-30%. On the other hand, the k-fold cross-validation method was used in 99 papers, representing about 46% of the sample. Various values of k were utilized, but the most frequently used were 10, 5, and 3. Regarding the statistical measures employed, we observed a wide variety of measures. However, we observed that three measures stood out as the most commonly used: accuracy, ROC-AUC, and F1-score. Accuracy was used in about 112 papers (about 52.3%), ROC-AUC in 81 papers (about 37.6%), and F1-score in 66 papers (about 31%). Once again, the absence of standardization in the selection of statistical measures for evaluating trained ML/DL models becomes apparent.
In conclusion, our analysis highlights the lack of consensus in the research community regarding the choice of evaluation techniques and statistical measures for assessing ML/DL models. However, this finding highlights the prevalence of specific evaluation methods, which could be considered as potential best practices within the research community.