
Exploring the State of Machine Learning and Deep Learning in Medicine: A Survey of Italian Research Community

17 July 2023


18 July 2023

Artificial Intelligence (AI) is becoming increasingly important, especially in the medical field. While AI has been used in medicine for some time, its growth in the last decade has been remarkable. Specifically, Machine Learning (ML) and Deep Learning (DL) techniques in medicine have been increasingly adopted thanks to the growing abundance of health-related data, improved suitability of such techniques for managing large data-sets, and more computational power. The Italian scientific community has been instrumental in advancing this research area. This article aims to conduct a comprehensive investigation of the ML and DL methodologies and applications used in medicine by the Italian research community in the last five years.
1. Introduction

Nowadays, Artificial Intelligence (AI) is playing an increasingly important role in the medical field, which has ever represented in the past a source of challenges and an important area for both experimenting and developing AI methodologies. One of the first and most prominent AI research areas is Machine Learning (ML) [1]. Like in medicine, for ML the observation and analysis of data is fundamental. In the past century the development and use of ML methodologies in medicine was however very limited, as presented in Figure 1. There are several reasons for such a limitation, including the fact that the application of AI in medicine was initially focused towards different approaches than ML, such as expert systems (e.g., [2]), and that the need for “large” amounts of data to automatically discover hidden patterns was unthinkable at the time.
In recent years, the advent on novel paradigms such as big data and IoT, as well as new computational models and increased computational resources, allowed AI and in particular machine learning to become a growing phenomenon both in the industrial and research fields. These new technologies have given researchers both new possibilities and new challenges. Furthermore, the maturity of AI methodologies has led to numerous results. In particular, looking at the production of scientific articles (see Figure 1), it is possible to see that the adoption ML/DL methodologies has grown exponentially in the two last decades. This trend is particularly evident in the last 5 years (i.e., since 2018), and the Italian research community is one of the main worldwide players in such field, being the seventh country for the number of scientific papers indexed on SCOPUS (see Figure 2).
To get a picture of the Italian scientific research in ML/DL in medicine, we carried out a systematic survey of the state of the art in Italy, according to the last trends depicted in the scientific papers produced by the Italian community. In particular, we focused on the period starting in 2018. Notably, the query has been performed on the 13th of January 2023, and we decided to consider also the papers that were published or are in publication in 2023.
Summarizing, with the aim of taking a snapshot of the current Italian research community on ML, in this paper:
  • We will review the state-of-the-art in Italy of recent years (i.e., since 2018), focusing on ML/DL in medicine, including all medical areas.
  • We will present a general map of ML/DL research in Italy
  • We will propose a categorization of ML/DL approaches in medicine
  • We will comprehensively classify the most relevant medicine-related ML/DL applications
The paper is organized as follows: in Section 2, we present and discuss the methods used to gather the data for building this review. In particular, Section 2.1 describes the framework for the paper selection, Section 2.2 analyses the possible limitations of our work and Section 2.3 presents the dimension used for the paper classification. In Section 3, we present the output of our analyses. In Section 3.1, we present a general picture of ML/DL research in Italy since 2018. Then, in Section 3.2, a comprehensive overview and classification of ML/DL relevant papers in medicine are provided. Finally, in Section 4 we present our final considerations.

2. Methods

In this section, we describe our methods: first, we illustrate the ground of our survey (Section 2.1) and discuss the limitation (Section 2.2), and then we present the dimension of our analysis (Section 2.3).

2.1. The framework

We will outline here the process for selecting and analyzing the papers included in the review. In Figure 3, we provide a visual representation of our methodology. The blue ovals represent the activities that were performed automatically, the green boxes activities that were performed manually, and in the orange boxes we show the number of papers outputted by the previous activity is reported.
The starting point of our work is the output of the query in Figure 3 performed1 via SCOPUS (i.e.,, shown below:
(( TITLE-ABS-KEY (machine  AND  learning)
OR  TITLE-ABS-KEY (deep  AND  learning))
AND  (TITLE-ABS-KEY (medicine)
OR  TITLE-ABS-KEY (medical)  OR  TITLE-ABS-KEY (health)))
AND  PUBYEAR  >  2017
We have selected all papers that in the title, in the abstract or in the keywords present the term “machine learning” or the term “deep learning”, and a reference to a medical area (i.e., one of the terms “medicine”,“medical”, and “health”) and that are published from 2018 (i.e., PUBYEAR > 2017) and at least one of authors has the affiliation in Italy (i.e., LIMIT-TO ( AFFILCOUNTRY , “Italy” )) 2
This query provides as output 2742 papers. These papers are used to provide a general description of the Italian research area in the field (see Section 3.1). Then, we filter (i.e., apply more conditions to the SCOPUS query) the papers to be more focused on the significant works. We have restricted the paper to analyze on the basis of:
  • type of paper: only research journal paper (i.e., we excluded review/survey paper and conference paper)
  • subject area: we have considered only relevant subject areas in SCOPUS, i.e., Medicine, Computer Science, Engineering, Biochemistry, Genetics and Molecular Biology, Neuroscience, Pharmacology, Toxicology and Pharmaceutics, Health Profession, Nursing, Dentistry, Immunology and Microbiology, Multidisciplinary
Moreover, we have considered only journals that have published at least 5 five eligible papers in the period (i.e., the papers are published in 38 journals).
Thus, we obtained a set of 458 papers that are considered to be analyzed. These papers required manual pruning before we could begin the analysis phase. Some of these papers were not directly related to the medical field, (e.g., medical problems are only cited as possible applications of the proposed approaches, the query did not exclude all the reviews/surveys, there are position papers/letters about future perspectives, etc.). Additionally, we excluded any papers in which it was clear that Italian researchers did not contribute to the machine learning/deep learning aspects of the work. Next, we considered also 58 papers with a high number of citations (i.e., those in the 98th percentile) that were not selected by the previous filter. These papers were read and (manually) filtered in the same way.
After these filtering steps, we were left with 214 papers. We analyzed these papers using the criteria described in the following subsection.

2.2. Limitations

Our survey may be of course subject to some limitations related to the research criteria we adopted. First, we focused our research on Scopus, thus excluding a priori papers that were indexed by other databases such as Web Of Science (WOS) and PubMed only. However, we can safely assert that Scopus usually covers more journals and records than Web of Science and PubMed and then it usually represents the ideal source for such kind of research, also considering the huge coverage overlap among these databases. Furthermore, for what regards PubMed, it must be said that it is usually more oriented to the medical domain rather than to the Computer Science topics described in this survey.
Another limitation may be entitled to the research criteria adopted for selecting the relevant papers. In particular, we focused on research articles published in international journals, thus excluding conference proceedings, letters, reviews, and so on. The reasons for our choice were twofold. First, we wanted to concentrate on novel research only, and for such reason, reviews and surveys were excluded. Second, we excluded conference papers, because, in our opinion, novel but well-established research is commonly published in scientific journals rather than presented at international conferences, especially for what regards the Italian research community. While this can be in general considered correct, in some cases outstanding research may also be presented at leading international conferences and/or included in articles of different types. To mitigate such issues, we also included seminal papers (i.e., 98 percentile top-cited articles) that were excluded by the filtering criteria described in the previous section. Such papers have a number of citations that ranges from 90 to more than 500.
Finally, it is worth noting that at, the time of writing, researchers belonging to the Italian community have contributed in the last five years to more than 40 books regarding the topics analysed by this survey.
All of these facts suggest that the research community is fervent3, even beyond the results shown by this survey.

2.3. Analysis Criteria

We analyzed the papers considering the following dimensions:
  • the medical topics;
  • the type of data;
  • the type of preprocessing methods;
  • the learning methods;
  • the evaluation methods.
To identify the most common medical topics addressed in the papers, we systematically recorded and analyzed them.
To classify the type of data used, we adapted the taxonomy proposed in [3] and included four categories: Clinical Images, Biosignals, Biomedicine, and Electronic Health Records (EHR). However, we also encountered several types of data that did not fit within these four categories, which were sourced from diverse and problem-specific contexts. Given that the proportion of papers using such data was relatively low (around 10%) but relevant in the general context, we created a new generic class called “Others”. Figure 4 provides a graphical representation of the taxonomy and its related sub-areas.
Regarding the type of pre-processing methods and the learning methods, we classify the papers in the macro-areas according to the methods used. Note here that papers may belong to multiple classes if they encompass the use of methodologies belonging to different macro-areas.
We consider the following macro-classes for the type of pre-processing methods:
  • feature selection;
  • feature extraction;
  • feature reduction;
  • data filtering;
  • data normalization;
  • missing data management;
  • undersampling;
  • oversampling;
  • other.
We instead consider the following macro-class for the type of learning methods:
  • ML supervised
  • ML unsupervised
  • ML semisupervised
  • ML reinforcement learning
  • DL supervised
  • DL unsupervised
  • DL semisupervised
  • DL reinforcement learning
Moreover, we made some (simple quantitative) considerations about of publication journals.

3. Results

In this section, we present the output of our analyses:
  • in Section 3.1 we provide a general analysis on all papers to provide a (simple) general snapshot of ML/DL Italian research in the medical area;
  • in Section 3.2 we provide a systematic analysis of the selected papers as described in Section 2.

3.1. A description of Italian Machine Learning/Deep Learning research in the medical area at a glance

In this section, we give a general description of the whole Italian research community through the paper published and indexed by SCOPUS since 2018.
As described in Section 1, the Italian community is one of the most productive players in the area with more than 2500 papers. Figure 5 shows a continuous increasing trend in the last 5 years. It is quite interesting to point out that most papers are published in international journals (i.e., around 74%) and are open-access (i.e., around 59%). In such papers, 74 Italian institutions are involved: in this list, there are not only universities and research institutes but also hospitals. This fact shows that participation is wide-ranging and concerns actors in all aspects, i.e., both AI and medical ones, and shows a link between the research and academic groups with the local communities.
Analyzing the founding sponsor’s point of view, it is possible to see a wide spectrum of national and international founding sponsors. Figure 6 shows the top 10 founding sponsors, which papers have acknowledged. From a numerical point of view, the European Union was the first sponsor. Notably, the voice “European Union” groups different types of grants, e.g., Horizon 2020, 7th Framework Programme for Research, European Research Council. Moreover, the financial sponsorship of the Italian government is very relevant through grants provided by two different ministries (i.e., Ministry of Education, University and Research, and Ministry of Health). The other founding sponsors confirm well-established participation in international projects funded by grants, particularly from U.S.A. and U.K. agencies. Notably, the Italian research community’s international involvement is very high and such fact is confirmed by the data concerning the nationality affiliation of co-authors, see Figure 7.

3.2. Systematic analysis

On the basis of criteria and the “manual pruning” phase described in Section 2.1, we have selected 224 papers. These papers have been analysed through the criteria described in Section 2.3.
Figure 8 shows the source journal of 224 papers. The papers are distributed in 42 journals. In Figure 8, the journals are ordered in an alphabetic way. In the period analyzed, the top 3 journals for the number of publications are IEEE Access, Scientific Reports, and Applied Science.
Let us point out that not all the 224 papers are analyzed with our methodology, since 10 of these (i.e., [4,5,6,7,8,9,10,11,12,13] ) proposed new ML/DL methodologies, metrics or approaches that may be strongly related to the medical area, but they are not focused on a specific disease or on a specific case study. However, we beliave that it is very important that the Italian community does not only provide a bridge between ML/DL area and the medical area, but also proposes solutions to general issues, which are arisen by the peculiarity of the medical field.
First, let us focus on the medical topics considered in the paper we have analysed. Figure 9 shows the distribution of such medical topics and Table 2 reports the paper classification by medical topic. Note that all the topics considered only by just one paper are included within the other category.
Table 1. Classification by approach.
Table 1. Classification by approach.
Methodology Approach Reference
Machine Learning Unsupervised
reinforcement Learning
Deep Learning Unsupervised


reinforcement Learning
Researches that adopt multiple approaches may be present in more than one line.
The most faced is topic is represented by SARS-CoV-2 (i.e., 11.2 %). This result is not strange, since we considered the pandemic period and most of the efforts of the scientific community were focused against SARS-CoV-2. Cancer is also a very important topic (i.e., 16.77 %), with breast cancer representing one most considered topics (i.e., 7%), along with lung cancer, prostate cancer, and colorectal cancer that are considered by more than one paper. Such results are in line with cancer incidence, that sees such cancers as the most common types of cancers for occurrence in Europe [225]. Since different types of cancers are considered, we considered those appearing in just one article in the other types of cancer category (i.e., 4.2 %). Another relevant topic is represented by Parkinson’s disease (i.e., 3.7 %).
Then, let us analyse the dimension of the type of data used. Table 3 shows that majority of the papers (i.e., 82.2%, 176 papers) uses only one data type. However, it is quite interesting to note that 38 papers (i.e., 17.5%) use 2 or 3 different types of data dealing with the issue of managing data with different characteristics.
Table 2. Classification by medical topic
Table 2. Classification by medical topic
Topic Reference
Alzheimer’s disease [41,42,43,174,191,226]
Autism Spectrum Disorders [47,227]
%midrule Brain Tumors [50,188,195]
Breast Cancer [15,16,17,18,19,52,53,54,55,56,57,175,196,197,198]
Cardiovascular disease [60,61,62,176,201]
Chronic Kidney Disease [67,68,202]
Dementia [77,78,79,179]
Diabetes [80,81,82,83,84,85,204]
Exposure to extremely low frequency waves [20,21]
Glioblastoma [91,206]
Heart Failure [92,93,94,182,207]
kidney disease [100,228]
Lung Cancer [23,103,104,105,106]
Melanoma [109,110]
Multiple Sclerosis [24,113,211]
Parkinson’s Disease [25,123,124,125,126,127,128,129]
Prostate Cancer [130,131]
Rectal cancer [134,135,184]
SARS-CoV-2 [27,34,35,46,143,144,145,146,147,185,186,215,216]
Seasonal flu [28,156]
Sepsis [157,158,159]
Stroke [165,219]
Varicella Zoster [169,224]
Voice releated pathologies [172,173]
Other types of cancer [29,31,58,59,66,69,160,199,214]
Surgery related [63,99,121,132,161,162,192,220]
M-health [48,71,76,107,164,180]
Patient tele-Monitoring [26,33,112,210]
Liver diseases [89,95,102]
Orthopedic [90,122,133]
Arterial Disease [44,45,73]
Trauma [70,168]
Other [14,36,37,38,39,40,49,51,64,65,177,194,200,203]
Figure 10 shows the distribution of the type of data used in the paper we analyzed. The most used type of data is E.H.R. and the second one is Clinical images. Let us point out that the class Other reaches the value of 15.7%, that is higher than Biomedicine and Biosignal category. This fact can indicate that several medical topics and diseases involve the need of managing a lot of different types of data to assess and characterize their complex features.
For what concerns the pre-processing category, Table 4 shows that the majority of the papers (i.e., 128, about 60%) indicate that the authors applied at least one pre-processing technique.
Figure 11 shows the distribution of the principal pre-processing techniques. We can see that the features selection and features extraction are the most used techniques, covering about the 50% of cases.
Figure 12 shows the distribution of ML/DL methodologies adopted, and Table 1 presents the papers categorization by methodology. As described in Section 2.3, we cataloged the approaches used in 8 macro-categories. The majority of the papers (i.e., 171, about 80%) use one or more approaches belonging to only one of these categories, whereas the remaining papers (i.e., 43, about 20%) use two or more approaches belonging to two categories. It is quite interesting to note that the ML approaches (i.e., 72.8%) are more used than DL approaches, and the most used approach belongs to the ML supervised category (i.e., 62.5%). In general, supervised approaches, including both ML and DL, are largely adopted (i.e., 86.3%). These facts underline how ML approaches still represent the most used approaches in the medical field in Italy, probably due to the scarcity of data needed to train DL approaches, that are however widely applied for image-related data. Furthermore, most learning methodologies are supervised, and thus focused towards a specific outcome that is already present in the training data and that clearly determines the medical question that the model should address.
We also analyzed the evaluation methodology employed in the papers. Our findings revealed a significant heterogeneity among the techniques and statistical measures used, underlining the absence of a standardized approach within the research community for this aspect. However, we observed that the validation phase is predominantly (though not exclusively) performed using two main methods: a fixed split of the dataset into a training set and a test set, and k-fold cross-validation. The fixed split method was employed in 49 papers, accounting for approximately 22.9% . The most commonly used split values were 90%-10%, 80%-20%, and 70%-30%. On the other hand, the k-fold cross-validation method was used in 99 papers, representing about 46% of the sample. Various values of k were utilized, but the most frequently used were 10, 5, and 3. Regarding the statistical measures employed, we observed a wide variety of measures. However, we observed that three measures stood out as the most commonly used: accuracy, ROC-AUC, and F1-score. Accuracy was used in about 112 papers (about 52.3%), ROC-AUC in 81 papers (about 37.6%), and F1-score in 66 papers (about 31%). Once again, the absence of standardization in the selection of statistical measures for evaluating trained ML/DL models becomes apparent.
In conclusion, our analysis highlights the lack of consensus in the research community regarding the choice of evaluation techniques and statistical measures for assessing ML/DL models. However, this finding highlights the prevalence of specific evaluation methods, which could be considered as potential best practices within the research community.

4. Discussion

The analysis of the state of the art in scientific papers focusing on ML and DL for medicine over the last five years has uncovered a rapidly expanding research area with substantial potential for applications in healthcare (see Figure 1).
First, we proposed a methodological analysis for the papers indexed in SCOPUS identifying a common set of dimensions. Our analysis encompassed a total of 2,742 papers, out of which we conducted a detailed methodological examination of 516 papers. Among these, 214 are studied using the dimension we proposed. The findings provided a comprehensive overview of the Italian research landscape in this field (see Figure 2). Furthermore, they highlighted how the community has worked on a very heterogeneous range of medical problems.
It is important to acknowledge that the utilization of ML and DL methodologies raises several legal and ethical concerns, the analysis and discussion on these topics are out of the scope of this paper. But, let us point out that the growing interest in and the adoption of ML/DL systems in the medical field, along with the positive results obtained, indicate the potential for these systems to serve as valuable tools in laboratory settings in the coming years.

Table 3. Number of data types used.
Table 3. Number of data types used.
Number of data types Number of Papers
1 176
2 32
3 6
Table 4. The number of pre-processing methods.used
Table 4. The number of pre-processing methods.used
Number of pre-processing methods Number of Papers
0 86
1 87
2 29
3 10
4 2
