2.1. Colon Cancer
Sena et al. [
12] took a ’direct’ method, labeling raw photos rather than segmenting them in 2019. A total accuracy of
was reached, with most mislabeling related to a nearby category. Tests on an external dataset with a different resolution produced more than
accuracies. This study proved that a properly trained neural network may give fast, accurate, and reproducible labeling for colon cancer images, thereby improving the quality and timeliness of medical diagnostics. In 2019, Yoon et al. developed some improved systems based on the Visual Geometry Group (VGG), which won the classification task in the 2014 ImageNet Large Scale Visual Recognition Competition (ILSVRC), and performed two tests [
13]. Firstly, they found the optimal modified VGG configuration for their incomplete dataset, yielding
,
,
,
, and
accuracies. And, the second experiment used the best adjusted VGG configuration to assess the performance of the CNN model. Their proposed modified VGG-E configuration demonstrated the highest performance in terms of accuracy, loss, sensitivity, and specificity, achieving
accuracy, a loss of
,
sensitivity, and
specificity across the entire dataset. In a study in 2019, Kather et al. looked into whether deep convolutional neural networks (CNNs) might derive prognosticators directly from these widely available photos [
14]. They manually identified single-tissue regions in 86 CRC tissue slides from 25 CRC patients, giving over
HE image patches, and utilized these to train a CNN using transfer learning, achieving a accuracy of more than
.
Wei et al. proposed a paper in 2020 where the prognostic analysis used histopathologic slides gathered from Dartmouth-Hitchcock Medical Center in Lebanon, New Hampshire [
15]. This dataset consisted with 326 slides for training, 157 for internal evaluation, and 25 for validation. The deep neural network had a mean accuracy of
(
CI,
-
) in the internal evaluation of 157 slides compared to local pathologists’ accuracy of
(
CI,
-
). For the external data collection, 238 slides for 179 different patients were received from 24 institutions in 13 states. The deep neural network attained an accuracy of
(
CI,
-
) comparable to the accuracy of local pathologists of 86.6% (95% CI, 82.3%-90.9%) on the external dataset. In 2020, Iizuka et al. trained convolutional neural networks (CNNs) and recurrent neural networks (RNNs) on biopsy histopathology whole-slide images (WSIs) from the stomach and colon [
16]. The models were taught to categorize WSI as adenocarcinoma, adenoma, or non-neoplastic. They examined their models on three separate test sets, reaching AUCs of 0.96 and 0.99 for colonic cancer and adenoma, respectively. The results show that their models are generalizable and have considerable potential for use in a practical histopathological diagnostic workflow system. In the same year, Xu et al. introduced a deep learning-based technique for colorectal cancer identification and segmentation using digitized H&E-stained histology slides [
17]. This study showed that the neural network approach achieves a median accuracy of
for normal slides and
for cancer slides when compared to pathologist-based diagnosis using H&E-stained slides digitized from clinical samples.
In 2021, Hamida et al. published research where they proposed two DL models using CNN-based histopathological image classification to diagnose colon cancer [
18]. They achieved impressive patch-level classification results, with ResNet reaching a
accuracy rate. Their ResNet model evaluated on
,
, and merged datasets and showed the effectiveness with accuracy rates of
,
, and
, respectively. They evaluated these datasets with SegNet and achieved accuracy rates of
,
, and
, respectively. Researchers, including Babu and Tina, worked on automatically extracting high-level characteristics from colon biopsy images for automated patient diagnosis and prognosis using transfer learning architectures for colon cancer detection this year [
19]. This study utilized a pre-trained CNN to extract visual features, which are then used to train a Bayesian optimal Support Vector Machine classifier. Furthermore, this optimal network for colon cancer detection was examined using pre-trained neural networks such as Inception-V3, VGG-16, and Alexnet. Additionally, four datasets are tested to assess the proposed framework: two are from Indian hospitals and are categorized as different magnifications (
,
,
, and
), while the other two are public datasets of colon images. Based on public datasets analysis using the above-mentioned models, the Inception-V3 network achieved an accuracy range of
to
and outperformed the other tested frameworks. Tasnim et al. used CNN with pooling layers and MobileNetV2 models for colon cell image categorization [
20]. The models are trained and tested at different epochs to determine the learning rate. The max pooling and average pooling layers were found to be
and
accurate, respectively. MobileNetV2 surpasses the other two models, with the highest accuracy of
and a data loss rate of
.
Sakr et al. proposed a lightweight deep learning method in 2022, utilizing CNNs to efficiently detect colon cancer histopathological images and normalizing input before training [
21]. The system achieved an accuracy of
, which was considered remarkable after comparative analysis with existing methods, highlighting its potential for improving colon cancer detection. Hasan et al. also used CNNs to analyze digital images of colon tissue to accurately classify adenocarcinomas in 2022 [
22]. Automated AI diagnosis could accelerate assessments and reduce associated costs, leveraging modern DL and digital image processing techniques. The results showed accuracy rates of up to
, indicating that implementation of this approach could lead to automated systems for detecting various forms of colon cancer. This year, Talukder et al. introduced a hybrid ensemble feature extraction model aimed to efficiently detect colon cancer using machine learning and deep learning techniques [
23]. Integrating deep feature extraction and ensemble learning with high-performance filtering for cancer image datasets, the computer-based model achieved impressive accuracy rates of
for colon cancer detection on the histopathological LC25000 dataset.
The study done by Bostanci’s research team in 2023 analyzed RNA-seq data from extracellular vesicles of healthy individuals and colon cancer patients to develop predictive models for cancer presence and stage classification [
24]. The study achieved high accuracy rates by utilizing both canonical machine learning and deep learning classifiers, including KNN, LMT, RT, RC, RF, 1-D CNN, LSTM, and BiLSTM. Canonical ML algorithms reached up to
accuracy for cancer prediction and
for cancer stage classification, while DL models achieved
and
accuracies, respectively. The results indicate that both ML and DL models can effectively predict and classify colon cancer stages, varying their performance depending on the number of features.
2.2. Lung Cancer
In 2019, Zhang et al. introduced a three-dimensional CNN that detects and classifies lung nodules as malignant or benign based on histological and laboratory results [
25]. The well-trained model has a sensitivity of
(
CI,
-
) and specificity of
(
CI,
-
). Smaller nodules (<10 mm) have high sensitivity and specificity compared to bigger nodules (10-30 mm). Manual assessments from various doctor grades were compared to three-dimensional CNN results to validate the model. The results suggest that the CNN model outperformed the manual assessment. Pham et al. created a revolutionary two-step deep learning system to address the problem of false-positive prediction while retaining accurate cancer diagnosis [
26]. Three hundred and forty-nine whole-slide lung cancer lymph node pictures were gathered, including 233 slides for training, 10 for validation, and 106 for testing. The first step was using a deep learning algorithm to exclude often misclassified noncancerous areas (lymphoid follicles). The second phase involved developing a deep-learning classifier to detect cancer cells. These two-step strategies decreased errors by
on average and up to
on slides containing reactive lymphoid follicles. Furthermore,
sensitivity was achieved in macro-metastases, micro-metastases, and isolated tumor cells.
Gertych et al. developed a pipeline that used a CNN and soft-voting as the decision function to identify solid, micro-papillary, acinar, and cribriform growth patterns, as well as non-tumor areas [
27]. Slides from the main LAC were received from Cedars-Sinai Medical Center (CSMC), the Military Institute of Medicine in Warsaw, and the TCGA portal. Several CNN models trained with
image tiles taken from 78 slides (MIMW and CSMC) were tested on 128 test slides from the three locations based on F1-score and pathologist-manual tumor annotations. The best CNN produced F1 scores of
(solid),
(micropapillary),
(acinar),
(cribriform), and
(non-tumor), respectively. The overall accuracy in recognizing the five tissue classifications was
percent. Slide-based accuracy in the CSMC set (
) was considerably higher (p<2.3E-4) than in the MIMW (
) and TCGA (
), indicating superior slide quality. Hatuwal & Thapa proposed a CNN to categorize an image as benign, adenocarcinoma, or squamous cell carcinoma in 2020 [
28]. The model achieved
and
accuracies during training and validation, respectively. The model’s performance was evaluated using precision, f1-score, recall, and a confusion matrix.
Saif et al. sought to use and modify the current pre-trained CNN-based model to detect lung and colon cancer using histopathology pictures and improve augmentation strategies [
29]. Eight distinct pre-trained CNN models were trained on the LC25000 dataset: VGG16, NASNetMobile, InceptionV3, InceptionResNetV2, ResNet50, Xception, MobileNet, and DenseNet169. The model’s performance is evaluated using precision, recall, f1-score, and accuracy. GradCAM and SmoothGrad were used to represent the pre-trained CNN models’ attention images that identify malignant and benign images. After training and testing on 1500 photos, the suggested model achieved an overall accuracy of
, whereas the VGG16 model achieved
. The proposed model had a sensitivity of
for adenocarcinoma,
for benign, and
for squamous cells. Abbas et al. used several off-the-shelf pre-trained (on ImageNet data set) CNNs to classify the histopathological slides into three classes: lung benign tissue, squamous cell carcinoma, and adenocarcinoma [
30]. The F-1 scores of AlexNet, VGG-19, ResNet-18, ResNet-34, ResNet-50, and ResNet-101 on the test dataset showed the results of
,
,
,
,
, and
, respectively. Srinidhi et al. created the first deep learning-based classifier to classify lung adenocarcinoma, lung squamous cell carcinoma, small cell lung carcinoma, pulmonary tuberculosis, organizing pneumonia, and normal lung in 2021 [
31]. The EfficientNet-B5 model outperformed ResNet-50 and was chosen as the classifier’s backbone. Four medical centers tested 1067 slides with a classifier showing consistently high AUCs of
,
,
, and
. The intraclass correlation coefficients were greater than
. In the same year, Han et al. used 50 top-ranked feature subset selection techniques for categorization [
32]. The LDA (AUROC:
; accuracy:
) and SVM (AUROC:
; accuracy:
) classifiers, along with the
NR feature selection approach, performed optimally. Our investigation found that the random forest (RF) classifier (AUROC:
; accuracy:
) and the
NR feature selection approach (AUROC:
; accuracy:
) performed well on average. Furthermore, the VGG16 DL algorithm (AUROC:
; accuracy:
) beat all other machine learning methods when combined with radiomics.
In a work in 2021, P Marentakis et al. wanted to look at the potential of NSCLC histological classification into AC and SCC using various feature extraction and classification approaches on pre-treatment CT scans [
33]. The picture dataset used (102 patients) was obtained from the publicly available cancer imaging archive collection (TCIA). They looked at four different technique families: (a) radiomics with two classifiers (kNN and SVM), (b) four cutting-edge CNNs with transfer learning and fine tuning (Alexnet, ResNet101, Inceptionv3, and InceptionResnetv2), (c) a CNN combined with a long short-term memory (LSTM) network to fuse information about the spatial coherency of tumor CT slices, and (d) combinatorial models (LSTM + CNN + radiomics). Additionally, two qualified radiologists independently assessed the CT pictures. Our findings indicated that Inception was the best CNN (accuracy =
, auc =
). LSTM + Inception outperformed all other algorithms (accuracy =
, auc =
). Additionally, LSTM + Inception beat experts by
(p < 0.05).
Abdul Rahaman Wahab Sait developed a deep-learning model for lung cancer detection using PET/CT images comprising
annotated images in 2022 [
34]. He addressed challenges like computational complexity by employing techniques such as preprocessing, augmentation, and model optimization. A CNN-based DenseNet-121 and MobileNetV3 models were constructed to extract features and identify the types of lung cancer. His model achieved a high accuracy of
and a Cohen’s Kappa value of
with fewer parameters and can potentially aid in early-stage lung cancer detection. In 2022, Shandilya and Nayak formulated a computer-aided diagnostic (CAD) approach for classifying histopathological images of lung tissues [
35]. Utilizing a publicly available dataset of
samples of histopathological photographs, they extracted image features and assessed seven pre-trained convolutional neural network models, including MobileNEt, VGG-19, ResNet-101, DenseNet-121, DenseNet-169, InceptionV3, Inception ResNet-V2, and MobileNetV2 for the
samples of histopathological images classification. Among them, ResNet-101 attained the highest accuracy of
. In the same year, Ameer et al. developed a deep learning model for automated lung cancer cell detection in histopathological tissue images [
36]. They used several models encompassing InceptionV3, Random Forest, and CNNs. These models were trained meticulously to extract important features from the images, thereby improving the efficiency and accuracy of lung cancer cell detection. The proposed model achieved remarkable accuracy of
, precision of
, recall of
, F-score of
, and specificity measures of
.
In 2023, Priyadarsini et al. proposed a framework designed to detect and categorize lung cancer using deep learning models trained on X-ray and CT scan images [
37]. Three deep learning models - sequential, functional, and transfer models, were implemented and trained on open-source datasets to improve patient treatment. Emphasizing deep learning methods, particularly CNNS, they extracted specific features from image datasets. The Functional model stood out with
accuracy and
specificity for lung cancer detection while requiring fewer parameters and computational resources than existing models. Siddiqui et al. introduced a pioneering method for lung CT image classification, focusing on enhancing efficiency and accuracy in 2023 [
38]. The method employed an enhanced Gabor filter for pre-processing, reducing parameters using Gauss-Kuzmin distribution to maintain detail while minimizing computational load. Feature selection was conducted via an enhanced deep belief network (E-DBN) with two cascaded restricted Boltzmann machines (RBMs), followed by evaluation with five classifiers, leading to the selection of a support vector machine (SVM) for optimal performance. Experimental results demonstrate superior accuracy and sensitivity compared to existing methods, with the proposed approach achieving an F1 score of
and accuracy of
. These findings suggest promising advancements in lung cancer diagnosis through advanced image processing techniques. Wahid et al. proposed a CAD in 2023 utilizing CNNs to detect lung cancer within the LC25000 dataset, encompassing 25,000 histopathological color image samples [
39]. Four CNN models, including ShuffleNet-V2, GoogLeNet, ResNet-18, and a customized CNN model, were used. Among them, ShuffleNet-V2 achieved the highest accuracy of
and exhibited the shortest training time of
seconds.
2.3. Lung and Colon Cancer
A study by Masud et al. aims to offer a computer-aided diagnosis system for diagnosing squamous cell carcinomas, lung adenocarcinomas, and colon adenocarcinomas using convolutional neural networks and digital pathology pictures in 2020 [
40]. A shallow neural network design was employed to identify the histological slides as squamous cell carcinomas, adenocarcinomas, or benign lung. A similar methodology was used to classify adenocarcinomas and benign colon tumors. The diagnosis accuracy for the lung and colon was around
and
, respectively. Garg et al. also published a work in 2020 that seeks to use and modify the current pre-trained CNN-based model to detect lung and colon cancer using histopathology pictures and improved augmentation strategies [
41]. This article trained eight distinct pre-trained CNN models on the LC25000 dataset: VGG16, NASNetMobile, InceptionV3, InceptionResNetV2, ResNet50, Xception, MobileNet, and DenseNet169. The model’s performance is evaluated using precision, recall, f1-score, accuracy, and auroc scores. The results show that all eight models achieved significant outcomes, ranging from
to
accuracy. GradCAM and SmoothGrad are then utilized to represent the attention images of pre-trained CNN models that identify malignant and benign images.
Ali et al. presented a novel multi-input dual-stream capsule network in 2021 that uses the powerful feature learning capabilities of conventional and separable convolutional layers to classify histopathological images of lung and colon cancer into five categories (three malignant and two benign) [
42]. They pre-processed the dataset using a novel color balancing technique that attempts to adjust three color channels before gamma correction and sharpening the most noticeable features. The suggested model was given two inputs simultaneously (one with original photos and the other with pre-processed images), allowing it to learn features more effectively. The provided findings reveal that the model has an overall accuracy of
and a f1-score of
.
In a research published in 2021, Mehedi et al. described a unique DL-based supervised learning approach that uses pathological image analysis to identify five distinct tissue types (two non-cancerous, three cancerous) present in lung and colon tumors [
40]. The LC25000 dataset was utilized for both training and validation techniques. Two different kinds of domain transformations were used to obtain four sets of features. The resulting features were concatenated to create a combined collection of features with both kinds of information. The results confirm that the model is accurate and reliable (
F-measure score) for identifying lung and colon cancer, with a peak classification accuracy of
.
In 2022, Hage et al. developed CADs using artificial intelligence to accurately classify different types of colon and lung tissues based on histopathological images [
43]. The researchers utilized machine learning models, including XGBoost, SVM, RF, LDA, MLP, and LightGBM, to classify histopathological images that they got from the LC25000 dataset. The results showed that models achieved satisfactory accuracy and precision in identifying lung and colon cancer subtypes, among which the XGBoost model performed the best, with an accuracy of
and an F1-score of
. Talukder et al. developed a hybrid ensemble model for the efficient detection of lung and colon cancer, which combined deep feature extraction and ensemble learning techniques to analyze histopathological image datasets using a set of metrics (LC25000) [
23]. The model was evaluated using high-performance filtering and achieved high accuracy rates for detecting lung and colon cancer of
. Mehmood et al. also developed a highly accurate and computationally efficient model for the rapid and precise diagnosis of lung and colon cancer in 2022 [
44]. They utilized a dataset consisting of
images divided into five classes. To train the model, they modified four layers of the pre-trained neural network, AlexNet, and achieved an overall accuracy of
. They further enhanced the image quality through contrast enhancement techniques, resulting in an improved accuracy of
.
In 2023, Singh et al. presented an ensemble classifier that combined random forest, support vector machine (SVM), and logistic regression [
45]. The deep features from lung and colon cancer images, obtained from the LC25000 dataset, were extracted using VGG16 and binary pattern methods. These methods yielded the initial relevant features for the ensemble classifier. The proposed methodology achieved an average accuracy of
, precision of
, and recall of
. Bhattacharya et al. proposed a framework that combined deep learning and meta-heuristic approaches for the accurate prediction of lung and colon cancer from histopathological images in which they trained deep learning models, ResNet-18 and EfficientNet-b4-wide, on the LC25000 dataset and extracted deep features [
46]. They developed the AdBet-WOA hybrid meta-heuristic optimization algorithm to remove redundancy in the feature vector. They used the SVM classifier to distinguish lung and colon cancer, achieving an impressive accuracy of
. Al-Jabbar et al. developed three strategies, each with two systems, to analyze the dataset in 2023 [
47]. The GoogLeNet and VGG-19 models were used to enhance the images and increase the contrast of affected areas, followed by dimensionality reduction using the PCA method to retain essential features. They used ANN with fusion features of CNN models and handcrafted models and reached a high sensitivity of
, precision of
, accuracy of
, specificity of
, and AUC value of
, indicating the effectiveness of the proposed approach for the early diagnosis of lung and colon cancer.