This study conducted the model training and evaluation using the Keras package and Python programming language in the Jupyter Notebook environment. The pre-processed data were divided into 80% training and 20% test set and fed into the novel MSHA model. The model was validated using a 20% subset of the training data and trained using 50 epochs. We adopted the dropout regularization technique after the third max pooling layer and in the dense layers. Dropout regularization is an easy-to-use regularization technique. It produces a simple and efficient neural network by turning off some neurons during training. Simple neural network results in less complexity and, in return, reduce overfitting. Two callbacks, including EarlyStopping and ReduceLROnPlateau, were implemented to improve the training process, optimize model performance, and prevent overfitting. The EarlyStopping monitors the validation loss and stops the training process early if the loss does not improve for a certain number of epochs [
17]. In our case, we selected the validation loss. The ReduceLROnPlateau callback reduces the learning rate when the validation loss does not improve for a certain number of epochs [
18]. In this case, patience=3.
3.1. Performance Evaluation Metrics
Performance evaluation metrics are crucial in developing, testing, and deploying machine learning models, allowing for more accurate and effective AI solutions [
19]. They provide a way to quantitatively measure the accuracy, precision, sensitivity, specificity, and other aspects of the model’s performance. With performance evaluation metrics, it is possible to determine how well a model performs or compare different models’ performance. Performance evaluation metrics also help improve machine learning models’ transparency and interpretability, essential for building trust in these systems [
20].
In this study, the performance of the novel MSHA model was evaluated using performance evaluation metrics. The results indicate that the MSHA model was highly accurate in detecting and classifying ACC and kidney tumors, with a 97.00% and 95.00% precision, as shown in
Table 3. The model’s high precision score indicates it could correctly identify ACC and kidney tumors in most cases. Also, the sensitivity score of 94.00% for ACC and 97.00% for kidney tumors show that the model could correctly identify all positive cases. This indicates a highly sensitive model with promising clinical significance in accurately detecting ACC and kidney tumors in CT images. Subsequently, the specificity score of 96.80% for ACC and 94.50% for kidney tumors show that the model could correctly identify all negative cases. This indicates that the model is highly specific and can accurately tell when the tumor is absent. Furthermore, the F1 score of 96.00% for ACC and kidney tumors indicates the model’s potential to balance precision and sensitivity efficiently. This means the model could identify the positive cases (ACC and kidney tumor) while correctly minimizing the false positives. Finally, the accuracy score of 95.65% indicates that the model could correctly classify the CT images of ACC and kidney tumors with a high degree of accuracy.
The high performance of the MSHA model in detecting and accurately classifying CT images of ACC and kidney tumors has significant clinical implications in providing a rapid and accurate diagnosis, particularly in regions with limited access to specialized medical care and facilities. The model highlights the potential of the MSHA model as a valuable tool in detecting and classifying ACC and kidney tumors using CT images.
Table 3.
The Performance Evaluation Metrics for the MSHA Model.
Table 3.
The Performance Evaluation Metrics for the MSHA Model.
|
Precision % |
Sensitivity % |
Specificity % |
F1 Score % |
Accuracy % |
ACC |
97.00 |
94.00 |
96.80 |
96.00 |
95.65 |
Kidney tumor |
95.00 |
97.00 |
94.50 |
96.00 |
3.3. Learning Curve
Model accuracy and model loss learning curves are important tools for evaluating the performance of deep learning models. They provide information about the accuracy and loss of a model over time during the training process, which can help identify potential issues with the model’s performance and guide improvements to the model [
5]. The model accuracy learning curve shows the model’s accuracy on the training and validation datasets over time. It can reveal whether the model is overfitting or underfitting the training data [
22]. An overfit model will have high accuracy on the training data but low accuracy on the validation data, indicating that it needs to generalize better to new and unseen data [
23]. An underfit model will have low accuracy on both the training and validation data, indicating that it needs to learn the patterns in the data better [
24]. Monitoring the model accuracy learning curve makes it possible to identify the optimal number of epochs to train the model and ensure that it is not overfitting or underfitting.
The MSHA model produced a good fit and could learn the underlying patterns in the data without overfitting or underfitting. The increased training accuracy over the epochs indicates that the model is learning and improving. As shown in
Figure 5, the training started slowly at epoch 0 and maintained an upward and steady increase to produce an approximate training accuracy of 92.7% between epochs 44-50. As important as the training accuracy, it is important to evaluate the model’s performance on the validation data, which represents new, unseen data, to ensure that the model is balanced with the training data. Similar to the training accuracy, the validation accuracy starts slowly and experiences a slight irregularity. However, it steadily increases until it plateaus at a validation accuracy of 94.7% at epoch 50, indicating that further training may not improve the model’s performance on new data.
The model loss learning curve shows the change in the loss function of a model as it trains over multiple epochs. The training loss decreases over time as the model learns to fit the data better, which is a good sign. The training loss started at 0.6919 at epoch 1 and gradually decreased uniformly to 0.1805 at epoch 50. This indicates that the model improves its ability to predict the correct output with less error. On the other hand, the validation loss steadily decreased from epochs 1 to 11 with consistent fluctuation between epochs 14-43 before stabilizing at epoch 45 and maintaining a uniform decrease. The validation loss measures the difference between the predicted and actual outputs on a data set that the model has not seen during training. Thus, it estimates the model’s performance on new data. A good fit is achieved when the model’s low training and validation losses have stabilized over several epochs. Low training and high validation loss indicate overfitting, while high values of both losses may suggest underfitting [
5]. Therefore, analyzing the model accuracy and model loss learning curves is crucial in evaluating a model’s performance and deciding how to improve it.
Figure 5.
The Model Accuracy and Loss of the MSHA Model.
Figure 5.
The Model Accuracy and Loss of the MSHA Model.
3.5. Comparative Evaluation with State-of-Art Transfer Learning Techniques
Comparing our model with other state-of-the-art transfer learning techniques such as ResNet50, VGG16, VGG19, and InceptonV3 provides a benchmark for evaluation, assesses generalizability, contributes to advancements in the field, and aids decision-making for practical applications. Furthermore, these models are well-known and widely used transfer learning techniques in computer vision with remarkable performance and significant contributions to image recognition tasks. By comparing the MSHA model with these well-established models, we can effectively assess its performance, competitiveness, and potential superiority. This comparison will help position the novel MSHA model within the context of existing state-of-the-art approaches and establish its credibility and relevance in computer vision.
Similar to the implementation of the MSHA model, all the state-of-art transfer learning techniques used were implemented on the Jupyter Notebook. The last layers were frozen during the fine-tuning process to preserve learned representations and prevent them from being modified or overwritten, allowing the model to focus on adapting its parameters to the new task at hand and reducing the number of parameters that need to be updated, making the training process more efficient and faster. Furthermore, three dense layers with the ReLU activation function were adopted before the final output layer with the sigmoid activation function. Finally, the model was trained using 50 epochs while callbacks were adopted to prevent overfitting.
The ResNet50 is a convolutional neural network (CNN) architecture known for its deep residual learning framework. It addresses the problem of vanishing gradients in very deep networks, allowing for the training of extremely deep models. It has been successful in various image classification challenges and is renowned for its ability to capture intricate features from images [
28]. The VGG16 and VGG19 are deep CNN architectures developed by the Visual Geometry Group (VGG) at the University of Oxford. These models are characterized by their uniform architecture, consisting of multiple stacked convolutional and fully connected layers. VGG16 and VGG19 are known for their excellent performance on large-scale image classification tasks, exhibiting high accuracy due to their deep and fine-grained feature extraction capabilities [
29]. The InceptionV3, also known as GoogleNet, introduced the concept of inception modules, efficiently capturing multi-scale features by employing parallel convolutions at different spatial resolutions. This architecture reduces the computational complexity while maintaining high accuracy. InceptionV3 has been widely used in various image recognition tasks and has demonstrated excellent object recognition and localization performance [
30].
The comparative analysis of the novel MSHA model with state-of-the-art models reveals significant variation in performance and provides differing clinical implications. The novel MSHA model significantly outperformed other models with improved performance. The MSHA model outperforms ResNet50, VGG16, VGG19, and InceptionV3 in terms of accuracy, precision, sensitivity, specificity, F1 score, AUC, and loss. It achieves an accuracy of 96.65%, significantly higher than the other models, as shown in
Table 4. The MSHA model also demonstrates higher precision, sensitivity, and specificity than ResNet50, VGG16, VGG19, and InceptionV3. Its F1 score of 96.0% indicates a superior balance between precision and sensitivity. Additionally, the MSHA model achieves an AUC value of 0.99, reflecting excellent discriminative ability, and has a lower loss value of 0.108, indicating better optimization and fewer errors.
The improved accuracy of the MSHA model holds significant clinical significance. With a 96.65% accuracy in correctly classifying CT images of ACC and kidney tumors, the MSHA model provides reliable and precise results. This high level of accuracy can greatly benefit oncologists and healthcare professionals involved in diagnosing and treating these types of tumors. It reduces the chances of misclassification, enabling early detection and appropriate intervention and improving patient outcomes.
By outperforming other models across various metrics, the MSHA model offers a more robust and accurate tool for assisting oncologists in making critical decisions. Its higher precision, sensitivity, and specificity values ensure better identification of positive cases and accurate exclusion of negative cases. The model’s superior F1 score indicates a well-balanced trade-off between precision and sensitivity, striking an optimal equilibrium in tumor classification. Moreover, the high AUC value of 0.99 signifies its excellent discriminative ability, distinguishing between ACC and kidney tumors with high confidence. The MSHA model’s lower loss value demonstrates effective error minimization and optimization, enhancing its overall performance.
The superior performance of the novel MSHA model can be attributed to several factors, such as its unique architectural design, effective training strategy, and better capability to learn and represent the relevant features in the skin lesion images. The incorporation of specific design choices, such as the Mixed-Scale Dense Convolution Layer, Self-Attention Mechanism, Hierarchical Feature Fusion, and Attention-Based Contextual Information, enabled the MSHA model to capture and extract relevant features more effectively for skin lesion classification. The MSHA model’s architecture seems better suited to learning and representing the intricate patterns and structures in the skin lesion images associated with ACC and kidney tumors. Also, the MSHA model was trained using an optimized configuration and effective training strategies, such as carefully selecting hyperparameters such as learning rate, batch size, and regularization techniques. These configurations will facilitate faster convergence and help the model find a more optimal solution.
Despite ResNet50 being a sophisticated model known for its deep architecture and skip connections, it exhibited the least performance compared to other models. The lower precision and F1 score of 66.0% suggests that ResNet50 had a higher rate of false positives and false negatives, resulting in suboptimal predictions. This could result from the unique data characteristic of the data used and the architectural complexity of the ResNet50 model, which may not be ideal for detecting CT images associated with ACC and kidney tumors. Also, the VGG19 model, designed as a more sophisticated version of VGG16, achieved slightly lower performance with a sensitivity, specificity, F1 score, TP, AUC, and accuracy lower than the VGG16 model. This difference can also be attributed to factors such as increased model complexity in the case of the VGG19 model leading to insufficient representation of the specific features relevant to the classification task.
Table 4.
The Comparative Analysis of the Novel MSHA and Other State-of-the-art Models.
Table 4.
The Comparative Analysis of the Novel MSHA and Other State-of-the-art Models.
Models |
Precision % |
Sensitivity % |
Specificity % |
F1 Score % |
TP |
FP |
TN |
FN |
AUC |
Loss |
Accuracy % |
MSHA |
96.0 |
96.0 |
96.0 |
96.0 |
3444 |
204 |
3525 |
113 |
0.99 |
0.108 |
96.65 |
ResNet50 |
66.0 |
66.0 |
66.5 |
66.0 |
2475 |
1173 |
2335 |
1303 |
0.72 |
0.615 |
66.02 |
VGG16 |
81.0 |
81.0 |
81.0 |
81.0 |
2872 |
776 |
3004 |
634 |
0.90 |
0.424 |
80.65 |
VGG19 |
81.0 |
80.0 |
80.0 |
80.0 |
2667 |
981 |
3181 |
457 |
0.89 |
0.424 |
80.26 |
InceptionV3 |
72.0 |
72.0 |
72.0 |
72.0 |
2900 |
748 |
2302 |
1336 |
0.79 |
0.592 |
71.40 |