1. Introduction
To date, skin cancer is the most frequently diagnosed form of oncopathology in humans and represents a wide range of malignancies [
1]. More than 40% of the total number of diagnosed cancer cases in the world is skin cancer [
2]. The sharp increase in the incidence of skin cancer is explained by chronic exposure to ultraviolet radiation (UV) [
3] and the predominant skin phototypes I-II in the population [
4,
5], which are characterized by a high risk of malignant pigmented neoplasms.
Skin cancer can be divided into two types: non-melanoma and melanoma [
6]. According to statistics from the World Health Organization (WHO), 325,000 new cases of melanoma were registered in 2020 [
7], of which more than 17% of deaths were due to diagnosis at the last stage of oncopathology [
8]. The median five-year survival rate for patients diagnosed with early-stage melanoma is about 99% [
9]. In later stages, when the disease reaches the lymph nodes, the survival rate drops to 68% [
10]. In the last stages, when the disease metastasizes to distant organs, the five-year survival rate is 27% [
11].
Non-melanoma skin cancer (NMSC) includes basal cell carcinoma, squamous cell carcinoma, and other less common skin cancers [
12]. NMSC accounts for about 1/3 of all malignant neoplasms diagnosed annually worldwide [
13]. Although NMSC is 18-20 times more common than melanoma, there are epidemiological data for this type of cancer [
14]. This type of skin cancer is often not registered in the databases of national cancer registries or registered incompletely since in most cases it is successfully treated with excision [
15] or ablation [
16]. A tumor diagnosed by pathohistology is coded by the International Classification of Diseases 11th revision (ICD-11) [
17]. Melanoma has a C43 classifier, so the statistics for this diagnosis are reliable. The heterogeneous group of NMSCs has a single code (C44) to cover all types of non-melanoma cancers [
18]. Therefore, separate data on basal cell carcinoma, squamous cell carcinoma, and other skin malignancies are not available [
19], making it difficult to count and accurately assess individual diagnoses of NMSC [
2]. Thus, there is a need to develop balanced auxiliary diagnostic tools aimed at identifying various non-melanoma and melanoma types of malignant skin lesions, including basal cell carcinoma, squamous cell carcinoma, and others.
A significant influence on the risk of skin malignant lesions is exerted by such statistical factors as age, gender, localization of the pigmented lesion on the body, genetic predisposition, melanin content in the skin layers, etc. [
20]. The increase in the incidence of melanoma is directly proportional to age, as evidenced by the average age of diagnosis, which is approximately 60 years [
21]. The relationship between the occurrence of malignant pigmented lesions and age becomes very clear in people over 75 years of age, when the incidence rate doubles [
22]. Gender also has a significant impact on the risk of skin cancer. The incidence of melanoma in men is 1.5 times higher than in women [
23]. The incidence of NMSC is also closely related to age and gender. At an early age, people of either sex show the same prevalence of any type of NMSC. However, in men older than 45 years, NMSC is diagnosed 2-3 times more often than in women [
24]. Therefore, in the primary diagnosis, in addition to visual analysis, it is also necessary to take into account the complete clinical picture of each patient.
To date, the main form of skin cancer detection is a visual clinical examination using dermatoscopy [
25]. Dermoscopy is a non-invasive method of analysis that allows you to study the diagnostically significant morphological features of pigmented skin lesions [
26]. The average accuracy of visual diagnosis of malignant tumors by an experienced dermatologist is 65-75% [
27,
28]. This is because early diagnosis of skin cancer can be difficult due to similar morphological manifestations in benign and malignant skin lesions. The method of visual diagnostics requires extensive training and experience from a specialist in the field of dermatology [
29]. If a malignancy is suspected, a histopathological examination is performed using a biopsy, which is an invasive diagnostic method. Histopathological analysis is considered the "gold standard" for diagnosing skin cancer. However, it is time-consuming and may be inconclusive in borderline cases. Discrepancies in diagnosis between individual pathologists can be up to 25% [
30,
31].
Artificial intelligence technologies make it possible to analyze skin pigment lesions in a faster, more convenient, and more affordable way [
32]. The main task of such systems is the preliminary assessment of suspicious pigmented skin lesions using high-quality histopathologically confirmed clinical images and machine learning methods [
33]. However, such systems cannot replace the decisive opinion of the pathologist and dermatologist-oncologist in the diagnosis of skin cancer due to the possibility of false negative predictions [
34]. Therefore, at present, the development of high-precision intelligent systems that can be used as auxiliary diagnostic tools for detecting malignant neoplasms at an early stage is becoming relevant.
One of the main problems of existing medical datasets is the asymmetric distribution of data toward the category of healthy patients [
35]. Deficiency or excessive excess of one or more categories is associated with the clinical characteristics of patients and disease characteristics, as well as research results. As a result, a large number of negative cases of the disease are diagnosed compared with a small number of positive cases of pathologies [
36]. Due to the influence of the more common category on traditional machine learning methods, the prediction results are good in the majority categories, but not accurate enough in the minority categories. There is a risk of false negative predictions, which can have potentially fatal consequences for patients.
To solve the problem of data imbalance, there are several approaches based on the transformation of training data [
37], the modification of training methods [
38], the development of single-class classifiers [
39], and classifier ensembles [
40]. The augmentation method using affine transformations allows you to increase the amount of training data due to minor changes in the color, size, and shape of images [
41]. However, simple operations are not enough to significantly increase the accuracy of recognition of the minority category or overcome the problem of overfitting [
42]. The resampling method balances the training data and can be used as oversampling in minority categories [
43], undersampling in majority categories [
44], or as a combination of both methods [
45]. A significant disadvantage of this method is the possibility of skipping significant diagnostic data during machine learning, as well as an increase in computational costs during data processing [
46].
Another approach is to modify training methods with weighting factors, where higher losses are assigned to minority categories [
47,
48]. Since the cost of classification loss is taken into account during machine learning by neural network algorithms, cost-based learning methods are the most optimal for datasets with skewed distribution [
49].
The rest of the work is structured as follows.
Section 2 is divided into several subsections. In subsection 2.1. a description of methods for pre-processing heterogeneous dermatological data is proposed. In subsection 2.2. a description of the modification of the cross-entropy loss function using weighting factors for unbalanced dermatological data is given. In subsection 2.3. a description of a multimodal neural network system for processing heterogeneous dermatological data with a modified cross-entropy loss function, which is sensitive to unbalanced data, is presented.
Section 3 presents the results of modeling the proposed balanced-trained multimodal neural network system for the classification of pigmented neoplasms with the stage of data preprocessing.
Section 4 discusses the obtained results and compares them with known works on neural network classification of dermatological skin images. In conclusion, the results of the work are summarized.
3. Results
For practical modeling, data were selected from the open archive of the International Skin Imaging Collaboration (ISIC). The ISIC Archive is an open-source platform that contains publicly available dermatological data under a Creative Commons license. Images of pigmented skin lesions are associated with patient statistics and confirmed diagnoses. The purpose of the archive is to provide open access to diagnostic dermatological data for training specialists in melanoma recognition methods, as well as for the development of clinical decision support systems and automated diagnostics. The selected data for modeling included 41,725 dermatological images of varying sizes and quality. Each image was associated with a set of statistical factors and an established diagnosis. All data were divided into 10 diagnostically significant categories such as vascular lesions, nevus, solar lentigo, dermatofibroma, seborrheic keratosis, benign keratosis, actinic keratosis, basal cell carcinoma, squamous cell carcinoma, melanoma. The selected categories are divided into "malignant" and "benign" groups and arranged in descending order of risk and severity of the course of the disease. Actinic keratosis is an intraepithelial dysplasia of keratinocytes and is characterized as a "precancerous" skin lesion (in situ squamous cell carcinoma). Therefore, this category was assigned to the group of “malignant” pigmented skin lesions [
75]. A graph of the distribution of selected dermatological images by category is shown in
Figure 6.
The set of statistical factors for each image included information about the patient's gender (male/female), the age group in increments of five years, and localization of the pigmented lesion on the body (anterior torso, head/neck, lateral torso, lower extremity, oral/genital, palms/soles, posterior torso, upper extremity). The statistical factors used for neural network modeling and their cardinality are presented in
Table 1.
At the stage of preliminary processing of statistical data, the “Age” parameter was divided into four groups by the age classification adopted by the World Health Organization (WHO). The first group of "young age" included patients under the age of 44 years. The second group of "middle age" included patients aged 45 to 59 years. The third group "elderly" included patients aged 60 to 74 years. The fourth group "long-livers" included patients aged 75 years and older. Thus, the variability of the "Age" parameter was reduced from 18 to 4 possible values. Graphs of the distribution of dermatological data by various statistical factors are shown in
Figure 7. As a result of the analysis of statistical data, it was found that the predominant number of patients belong to men and the age group of 75 years and older. Also, pigmented lesions are most often localized on the posterior torso. The data obtained are highly correlated with studies on the influence of statistical factors on the risk of skin cancer [
20,
21,
22,
23].
The simulation was carried out using the high-level programming language Python 3.11.0. All calculations were carried out on a PC with an Intel(R) Core(TM) i5-8500 processor at 3.00 GHz with 16 GB of RAM and a 64-bit Windows 10 operating system. Training of multimodal neural network systems was carried out using a graphics processing unit (GPU) based on NVIDIA GeForce GTX 1050TI video chipset. The Pytorch machine learning framework was used to model neural network systems. The NumPy, Pandas, and ScikitLearn libraries were used to process statistical data. The Matplotlib library was used to visualize the data.
To model a multimodal neural network system for recognizing pigmented skin lesions, sensitive to unbalanced data, neural network architectures DenseNet_161 [
76], Inception_v4 [
77], ResNeXt_50 [
78]. The selected convolutional architectures were pre-trained on the ImageNet natural image set. To date, the selected neural network architectures are recognized as the most productive and highly accurate compared to human capabilities [
79].
At the first stage of modeling, the selected dermatological data were pre-processed. The preprocessing of the statistical data was to create an input vector using the one-hot encoding method. The coding tables for each possible value of each statistic are shown in
Figure 8.
Table 2 shows the cardinality of each pre-processed statistic by the one-hot encoding method. Thus, it was possible to reduce the number of possible values that the statistical factors of patients can take from 28 to 14.
Pre-processing of visual data consisted in applying the proposed method for removing hair structures from [
55]. Examples of pre-processed dermatological images are shown in
Figure 9. The second step in pre-processing the visual data was to transform the size of the input data. The main part of the selected images of pigmented skin lesions from the ISIC archive is presented in the size of 450×600 pixels. For the selected neural network architectures, the requirements for input visual data are
pixels for the DenseNet_161 [
76] and ResNeXt_50 [
78] architectures,
pixels for the Inception_v4 architecture [
77]. Therefore, at the stage of pre-processing, the operation of transforming the size of the input images was applied. For further modeling, the dermatological database was divided at a percentage of 80 to 20 into training data and validation data. Affine transformations such as reflection, rotation, translation, scaling, etc. were applied to the training set of visual data. Data augmentation made it possible to avoid retrainin neural network models.
For the training process, preprocessed dermatological images of pigmented skin lesions were fed into the input of selected SNS from the training set. The vector of preprocessed statistical data from the training sample was fed to the input of the developed multilayer neural network architecture, consisting of three linear layers and ReLu activation layers. After the multimodal signals passed through the CNN and the linear perceptron, the output feature vectors were combined on the concatenation layer. The output signal was applied to the
softmax layer to determine the probabilistic ratio of predicted labels for 10 diagnostically significant categories. The obtained probabilities were compared with the true labels to the training data, and the error value was calculated using the modified cross-entropy loss function. Errors in less common categories were punished more severely for neural network architectures than errors in more common ones. As a result, there was a gradual memorization of true vectors and a minimization of losses during training. The calculated weight coefficients of each of the classes for modifying the cross-entropy loss function are presented in
Table 3.
Each neural network system was trained for 7 epochs. When using a larger number of epochs, a pronounced retraining of each of the proposed neural network systems was observed. The size of the input data packet was 8. SGD was used as an optimizer with a standard learning rate of 0.001 and a moment of 0.9.
Table 4 presents the results of assessing the accuracy of testing the proposed multimodal neural network system that is sensitive to unbalanced dermatological data.
Table 5 presents the results of estimating the loss function when testing the proposed multimodal neural network system. The presented results are compared with the original multimodal systems that are not sensitive to imbalanced data.
As a result of the simulation, it was found that the use of a modified cross-entropy loss function with the help of weight coefficients can improve the accuracy of neural network recognition and reduce the value of the loss function. The highest recognition accuracy of dermatological data was 85.19% and was obtained when testing the proposed multimodal neural network system that is sensitive to unbalanced data based on the DenseNet_161 architecture. When testing each of the proposed multimodal neural network architectures that are sensitive to unbalanced dermatological data, the recognition accuracy was higher than when testing the original multimodal neural network architectures. The increase in the accuracy of intelligent prediction in neural network architectures with a modified cross-entropy loss function was 1.02-4.03 percentage points, depending on the selected pre-trained CNN. The smallest loss function index was 0.1344 and was obtained when testing a multimodal neural network system that is sensitive to unbalanced data based on the DenseNet_161 architecture. The value of the loss function of the proposed multimodal neural network systems with a modified cross-entropy loss function was in all cases lower than that of the original multimodal neural network architectures. The decrease in the loss function exponent was 0.1219-0.0123 depending on the selected pre-trained CNN.
Table 6 presents the results of calculations of various methods for the quantitative evaluation of neural network systems.
For the statistical evaluation of the trained models, such quantitative methods as Specificity, Sensitivity, F-1 score, Matthew’s correlation coefficient (MCC), false negative rate (FNR), False positive rate (FPR), Negative predictive value (NPV) and Positive predictive were chosen. value (PPV). When evaluating intelligent systems for assisted dermatological diagnostics, sensitivity indicates how well the system can identify malignant skin lesions in patients who do have pigmentary oncopathology. The higher the sensitivity, the more reliable the intelligent medical system. When testing the proposed multimodal neural network systems for dermatological data recognition, it was found that the highest sensitivity index belongs to the proposed system based on the DenseNet_161 architecture with a modified cross-entropy loss function and is 0.8519. Specificity indicates how well the neural network system identifies patients with benign pigmented neoplasms. The best sensitivity index was obtained for a multimodal neural network system sensitive to unbalanced data based on the DenseNet_161 architecture and amounted to 0.9835. F-1 score is a measure of the evaluation of neural network systems and represents the harmonic mean of positive predictive value and sensitivity. The best F-1 score was obtained when testing the proposed neural network system with a modified loss function based on the DenseNet_161 architecture and amounted to 0.8519. At the same time, the statistical metric F-1 score is dependent on the ratio of positive and negative cases and cannot always correctly evaluate systems in which there is a clear imbalance of data. MCC is a more reliable measure of the statistical evaluation of systems with unbalanced data. A high MCC score indicates that the neural network system performs well in all four categories of the confusion matrix in proportion to the number of benign and malignant cases in the data set [
80]. The best MCC score was 0.7169 and was obtained when evaluating a multimodal neural network system based on the DenseNet_161 architecture, which is sensitive to unbalanced data. False positive rate (FNR) and true positive rate (FPR) are the probability of false and true rejection of the null hypothesis as a result of testing a neural network system. The positive and negative predictive values (PPV and NPV) indicate the proportion of benign and malignant system test results that are truly benign and truly malignant. As a result of testing all trained neural network systems, the best result for all four indicators FNR, FPR, NPV and PPV was obtained from a neural network system based on the DenseNet_161 architecture, which is sensitive to unbalanced data and amounted to 0.1481, 0.0164, 0.9835 and 0.8519, respectively. For all the considered testing metrics, the systems trained using the modified cross-entropy loss function had a higher result than the original multimodal systems for recognizing pigmented skin lesions. The use of a modified cross-entropy loss function when training multimodal neural network systems made it possible to obtain classifiers that are sensitive to unbalanced dermatological data.
Figure 10,
Figure 11 and
Figure 12 show confusion matrices for testing multimodal neural network systems. Diagnostic categories are arranged in order of increasing risk and severity of the course of the disease.
Figure 13 and
Figure 14 show the confusion matrices for testing multimodal neural network systems in two categories.
As a result of the analysis of confusion matrices, it can be concluded that the use of the modified cross-entropy loss function when training various multimodal neural network systems can reduce the number of false positive and false negative predictions. For intelligent systems of medical auxiliary diagnostics, reducing the percentage of false negative predictions is a critical task. The greatest result in the reduction of cases of false negative prediction was obtained when comparing multimodal neural network systems based on the DenseNet_161 architecture and amounted to 468 cases. The use of the modified cross-entropy loss function reduced the number of cases of false-negative recognition of pigmented skin lesions by 468 cases for the architecture based on DenseNet_161, by 36 cases for the architecture based on Inception_v4 and by 51 cases for the architecture based on ResNeXt_50.
As a result of calculations for the McNemar test in
Figure 15, it was found that the use of the modified cross-entropy loss function at the training stage of the neural network system made it possible to increase the number of correct recognitions in 497-1280 cases when the original multimodal neural network system made errors. At the same time, in 119-204 cases, the recognition results of a multimodal system sensitive to unbalanced data were incorrect compared to the original neural network system.
Due to the more severe punishment when training a multimodal neural network, it was possible to obtain a neural network system that is sensitive to unbalanced data. However, the proposed system cannot be used as an independent diagnostic tool due to the risk of false negative errors.
4. Discussion
The paper presents a multimodal neural network system with a modified cross-entropy loss function, sensitive to unbalanced heterogeneous dermatological data. The accuracy of the proposed neural network system based on the DenseNet_161 convolutional architecture was 85.19%. The system analyzes heterogeneous dermatological data represented by images of pigmented skin lesions and such statistical information as gender, age and location of pigmented lesions on the body. At the same time, the educational dermatological data available in the public domain are highly unbalanced towards “benign” categories. The modification of the cross-entropy loss function made it possible to overcome the data imbalance and achieve higher accuracy compared to the results of testing the original multimodal systems, as well as compared to the results of similar systems for detecting malignant skin lesions.
Table 7 compares the results of the recognition accuracy of pigmented skin neoplasms of the proposed system, sensitive to unbalanced data, with the results of similar multimodal systems.
The work [
81] presents a method for intelligent recognition of heterogeneous data, such as clinical images and statistical metadata. The modeling was carried out on a data set of 2917 clinical cases, divided into five diagnostically significant categories. As a result, the average test accuracy for multi-class classification of the ResNet-50 multimodal neural network architecture was 71.9%, which is 13.03 percentage points lower than the accuracy results of the proposed multimodal system with a similar ResNeXt_50 architecture trained with a modified cross-entropy loss function. This result is 13.20 percentage points lower than the test accuracy of the proposed multimodal system with the best DenseNet_161 architecture in terms of accuracy. The use of more training data, as well as the use of preprocessing methods and modification of the cross-entropy loss function, made it possible to significantly increase the accuracy of recognition of dermatological data compared to similar systems.
The work [
82] presents a multimodal neural network system CAFNet, which analyzes such heterogeneous dermatological data as dermoscopic and clinical images. The CAFNet system uses two architectures for feature extraction from dermoscopic and clinical images and a neural network architecture for feature analysis. The results show that CAFNet achieves an average accuracy of 76.80% on a test dataset of 7 diagnostically relevant categories of pigmented skin lesions. The accuracy of the CAFNet neural network system is 6.94% compared to the accuracy of recognizing pigmented skin lesions using the ResNet-50 SNS. Despite a significant increase in the recognition accuracy of pigmented skin lesions when using heterogeneous visual data, the CAFNet test results are 8.39 percentage points lower than those of the proposed multimodal neural network system with a modified cross-entropy loss function based on the DenseNet_161 architecture. The joint use of visual data and statistical data of patients made it possible to identify additional relationships between the diagnosis and pigmented neoplasm, thereby increasing the accuracy of intelligent diagnostics. At the same time, the use of the input data pre-processing stage also significantly improved the quality of the information processed by the artificial intelligence system.
The work [
83] presents a multi-mode data fusion diagnostic network MDFNet, which combines heterogeneous features of clinical skin images and clinical data of patients. The experimental results showed that the MDFNet system has an accuracy of 80.42% on the test data, which is about 9% higher than the accuracy of the neural network model using only dermatological images. Modeling of the system was carried out on 2298 clinical cases, divided into six categories of pigmented skin lesions. The authors of the work used ResNet_50 and DenseNet_121 as neural network architectures. Test evaluation of MDFNet based on the ResNet_50 architecture made it possible to obtain a classification accuracy of 77.11%, which is 7.82 percentage points lower than the accuracy of the proposed multimodal system based on a similar SNA ResNeXt_50. Test evaluation of MDFNet based on the DenseNet_121 architecture showed an accuracy of 80.42%, which is 4.77 percentage points lower than the accuracy of the proposed multimodal system based on the similar SNS DenseNet_161. Training using a modified cross-entropy loss function using weighting coefficients made it possible to obtain a classifier that is sensitive to unbalanced dermatological data and to reduce the frequency of false negative errors, in which malignant pigmented neoplasms are recognized as benign.
The proposed multimodal system trained with a modified cross-entropy loss function significantly exceeds the accuracy of visual analysis methods used by oncol dermatologists. A comparison of the accuracy of classification of pigmented skin lesions in dermatologists with different levels of experience and an artificial intelligence system was presented in [
84,
85,
86] by a computer program using an artificial algorithm. skin neoplasms. However, the developed multimodal neural network system, which is sensitive to unbalanced data, cannot replace the decisive opinion of a specialist. The proposed system can only be used as an additional diagnostic tool due to the risk of a false negative response, when a malignant neoplasm can be recognized as benign. Therefore, a promising direction for further research is the construction of more complex ensemble systems for neural network analysis of dermatological data. Another promising area for further research is the introduction of segmentation at the stage of pre-processing of visual data. Semantic segmentation will make it possible to highlight the contour of a pigmented neoplasm, the distortion of which is a diagnostic morphological manifestation of oncopathology. The development of web applications and computer programs for implementation in the healthcare sector as auxiliary tools for diagnosing oncopathologies is also relevant.
Figure 1.
Multimodal neural network system with a modified cross-entropy loss function, sensitive to unbalanced heterogeneous dermatological data.
Figure 1.
Multimodal neural network system with a modified cross-entropy loss function, sensitive to unbalanced heterogeneous dermatological data.
Figure 2.
An example of the step-by-step operation of the method of pre-cleaning of hair structures on dermatological images.
Figure 2.
An example of the step-by-step operation of the method of pre-cleaning of hair structures on dermatological images.
Figure 3.
Scheme for processing dermatological statistical data using the one-hot encoding method.
Figure 3.
Scheme for processing dermatological statistical data using the one-hot encoding method.
Figure 4.
Scheme of using a modified cross-entropy loss function for training a multimodal neural network system for recognizing pigmented skin lesions.
Figure 4.
Scheme of using a modified cross-entropy loss function for training a multimodal neural network system for recognizing pigmented skin lesions.
Figure 5.
The architecture of the proposed multimodal neural network system for recognizing pigmented skin lesions with a modified cross-entropy loss function.
Figure 5.
The architecture of the proposed multimodal neural network system for recognizing pigmented skin lesions with a modified cross-entropy loss function.
Figure 6.
Graph of the distribution of selected dermatological images into diagnostically relevant categories.
Figure 6.
Graph of the distribution of selected dermatological images into diagnostically relevant categories.
Figure 7.
Graph of the distribution of selected dermatological data by statistical factors of patients: a) by gender, b) by age, c) by localization of the pigmented lesion on the patient's body.
Figure 7.
Graph of the distribution of selected dermatological data by statistical factors of patients: a) by gender, b) by age, c) by localization of the pigmented lesion on the patient's body.
Figure 8.
Coding tables of statistical parameters of patients using the one-hot encoding method: a) gender, b) age, c) localization of the pigmented lesion on the patient's body.
Figure 8.
Coding tables of statistical parameters of patients using the one-hot encoding method: a) gender, b) age, c) localization of the pigmented lesion on the patient's body.
Figure 9.
An example of pre-processed dermatological images using the hairline cleaning method.
Figure 9.
An example of pre-processed dermatological images using the hairline cleaning method.
Figure 10.
Confusion matrices as a result of testing a multimodal neural network system based on the DenseNet_161 architecture: a) original multimodal neural network system; b) multimodal neural network system with a modified cross-entropy loss function.
Figure 10.
Confusion matrices as a result of testing a multimodal neural network system based on the DenseNet_161 architecture: a) original multimodal neural network system; b) multimodal neural network system with a modified cross-entropy loss function.
Figure 11.
Confusion matrices as a result of testing a multimodal neural network system based on the Inception_v4 architecture: a) original multimodal neural network system; b) multimodal neural network system with a modified cross-entropy loss function.
Figure 11.
Confusion matrices as a result of testing a multimodal neural network system based on the Inception_v4 architecture: a) original multimodal neural network system; b) multimodal neural network system with a modified cross-entropy loss function.
Figure 12.
Confusion matrices as a result of testing a multimodal neural network system based on the ResNeXt_50 architecture: a) original multimodal neural network system; b) multimodal neural network system with a modified cross-entropy loss function.
Figure 12.
Confusion matrices as a result of testing a multimodal neural network system based on the ResNeXt_50 architecture: a) original multimodal neural network system; b) multimodal neural network system with a modified cross-entropy loss function.
Figure 13.
Confusion matrices in two categories as a result of testing the original multimodal neural network system based on architectures: a) DenseNet_161; b) Inception_v4; c) ResNeXt_50.
Figure 13.
Confusion matrices in two categories as a result of testing the original multimodal neural network system based on architectures: a) DenseNet_161; b) Inception_v4; c) ResNeXt_50.
Figure 14.
Confusion matrices in two categories as a result of testing a multimodal neural network system modified with a cross-entropy loss function based on architectures: a) DenseNet_161; b) Inception_v4; c) ResNeXt_50.
Figure 14.
Confusion matrices in two categories as a result of testing a multimodal neural network system modified with a cross-entropy loss function based on architectures: a) DenseNet_161; b) Inception_v4; c) ResNeXt_50.
Figure 15.
Classification tables for testing multimodal neural network systems for recognizing pigmented skin lesions for McNemar analysis based on architectures: a) DenseNet_161; b) Inception_v4; c) ResNeXt_50.
Figure 15.
Classification tables for testing multimodal neural network systems for recognizing pigmented skin lesions for McNemar analysis based on architectures: a) DenseNet_161; b) Inception_v4; c) ResNeXt_50.
Table 1.
Table of the cardinality of each statistical factor selected for modeling from the dermatological database.
Table 1.
Table of the cardinality of each statistical factor selected for modeling from the dermatological database.
№ |
Statistical factor |
Cardinality |
1 |
Gender |
2 |
2 |
Age |
18 |
3 |
Localization on the body |
8 |
TOTAL |
28 |
Table 2.
Table of the cardinality of each pre-processed statistical factor selected for modeling from the dermatological database.
Table 2.
Table of the cardinality of each pre-processed statistical factor selected for modeling from the dermatological database.
№ |
Statistical factor |
Cardinality |
1 |
Gender |
2 |
2 |
Age |
4 |
3 |
Localization on the body |
8 |
TOTAL |
14 |
Table 3.
Weight coefficients are used to modify the cross-entropy loss function in a multimodal neural network system.
Table 3.
Weight coefficients are used to modify the cross-entropy loss function in a multimodal neural network system.
№ |
Diagnostic category |
Weight coefficient |
1 |
Vascular lesions |
3.8893 |
2 |
Nevus |
0.0353 |
3 |
Solar lentigo |
3.6444 |
4 |
Dermatofibroma |
3.9992 |
5 |
Seborrheic keratosis |
0.6721 |
6 |
Benign keratosis |
0.8954 |
7 |
Actinic keratosis |
1.1323 |
8 |
Basal cell carcinoma |
0.2900 |
9 |
Squamous cell carcinoma |
1.5000 |
10 |
Melanoma |
0.1758 |
Table 4.
The results of assessing the accuracy when testing the proposed multimodal neural network system, sensitive to unbalanced dermatological data.
Table 4.
The results of assessing the accuracy when testing the proposed multimodal neural network system, sensitive to unbalanced dermatological data.
CNN architecture
|
Results of test |
Original multimodal neural network system, % |
Multimodal neural network system with a modified cross-entropy loss function, % |
Difference in recognition accuracy between original and proposed multimodal neural network systems, % |
DenseNet_161 [76] |
81.15 |
85.19
|
4.04 |
Inception_v4 [77] |
82.42 |
83.86 |
1.44 |
ResNeXt_50 [78] |
83.91 |
84.93 |
1.02 |
Table 5.
The results of the loss function evaluation when testing the proposed multimodal neural network system, sensitive to unbalanced dermatological data.
Table 5.
The results of the loss function evaluation when testing the proposed multimodal neural network system, sensitive to unbalanced dermatological data.
CNN architecture
|
Results of test |
Original multimodal neural network system |
Multimodal neural network system with a modified cross-entropy loss function |
Different in value of the loss function between original and proposed multimodal neural network systems |
DenseNet_161 [76] |
0.2563 |
0.1344 |
0.1219 |
Inception_v4 [77] |
0.2087 |
0.1964 |
0.0123 |
ResNeXt_50 [78] |
0.1843 |
0.1475 |
0.0368 |
Table 6.
Results of testing multimodal neural network systems by quantitative assessment methods.
Table 6.
Results of testing multimodal neural network systems by quantitative assessment methods.
CNN architecture |
Loss function weights |
Specificity |
Sensitivity |
F-1 score |
MCC |
FNR |
FPR |
NPV |
PPV |
Simulation time, hh:mm:ss |
DenseNet_161 [76] |
Not used |
0.9791 |
0.8115 |
0.8115 |
0.6543 |
0.1884 |
0.0209 |
0.9791 |
0.8115 |
14:02:18 |
Used |
0.9835 |
0.8519 |
0.8519 |
0.7169 |
0.1481 |
0.0164 |
0.9835 |
0.8519 |
13:54:55 |
Inception_v4 [77] |
Not used |
0.9821 |
0.8397 |
0.8397 |
0.6929 |
0.1602 |
0.0178 |
0.9821 |
0.8397 |
09:28:24 |
Used |
0.9833 |
0.8494 |
0.8494 |
0.7165 |
0.1506 |
0.0167 |
0.9833 |
0.8494 |
10:52:07 |
ResNeXt_50 [78] |
Not used |
0.9795 |
0.8156 |
0.8156 |
0.6457 |
0.1844 |
0.0205 |
0.9795 |
0.8156 |
11:47:05 |
Used |
0.9821 |
0.8391 |
0.8391 |
0.6846 |
0.1616 |
0.0179 |
0.9820 |
0.8391 |
10:12:15 |
Table 7.
Accuracy results in testing various multimodal neural network systems for recognizing pigmented skin lesions.
Table 7.
Accuracy results in testing various multimodal neural network systems for recognizing pigmented skin lesions.
Multimodal neural network system for recognizing pigmented skin lesions |
Accuracy of recognition of pigmented neoplasms of the skin, % |
Known neural network systems |
[81] |
71.90 |
[82] |
76.80 |
[83] |
80.42 |
The proposed multimodal neural network system based on the DenseNet_161 architecture |
85.19 |