1. Introduction
Cervical cancer (CC) markedly affects the mortality rate of females worldwide [
1]. This type of cancer is associated with a greater number of cancer-related deaths per year than breast cancer [
2]. Various scientific studies have shown the existence of an inequality in the incidence of CC between different parts of the world [
3,
4] as shown by the fact that this it is the most common type of cancer and the leading cause of mortality from cancer in Latin America [
5,
7,
8]. The incidence rates of CC have been steadily increasing over the past years [
8].
CC is 99% linked to the human papillomavirus (HPV) [
9,
11,
12]. CC is an almost asymptomatic disease in its first two stages, which are the only two stages in which it is possible to treat the disease effectively [
6,
7]. For this reason, the early detection and management of early lesions are essential [
12]. However, over the past years, the mortality rate associated with CC has remained high in all Latin American countries, reflecting the inefficiency of programs and screening techniques [
3].
The early and accurate detection of CC remains a challenge in developing countries and CC continues to be a major health concern [
6,
13,
14]. The costs of diagnosis, treatment and control are among the highest in medicine; thus, this disease is considered catastrophic both collectively and institutionally [
14].
The diagnosis of CC depends on initial screening with Papanicolaou (Pap) smear cytology [
3]. The Pap smear consists of collecting cervical cells and examining them under a microscope to identify abnormalities [
4]. To date, It has been demonstrated that this type of screening has an efficacy of 70% in reducing the mortality rate of associated with CC and its sensitivity ranges from 50 to 75% [
15,
16]. The Pap smear has a low sensitivity for pre-neoplastic events, such as grade 1 cervical intraepithelial neoplasia (CINI), and for the same reason, numerous clinicians decide to perform a colposcopy with biopsy following a positive HPV test [
3]. The Pap smear has several limitations that undermine its performance: High human capital and time requirements, since it requires cytotechnologists or trained pathologists to manually review numerous slides. This technique is subjective and inconsistent, as different observers may have different interpretations about the cellular material. It is very prone to human error, such as misclassification, false negatives, false positives, or omissions. It has a low sensitivity and specificity, since subtle or rare abnormalities are usually omitted or it can confuse some benign conditions with malignant ones [
4].
In recent years, the improved detection of CC has been observed in developed countries owing to the implementation of liquid-based cytology (LBC). This is a novel method of the preservation and handling of cytological samples that can replace the traditional Pap smear, overcoming its limitations [
16,
17,
18]. In LBC, the sample is transferred to the fixative fluid, which increases the cytological detection of squamous intraepithelial lesions and reduces the number of unsatisfactory smears [
13]. In LBC, cells are filtered and transferred to the slide in a thin layer containing only one cell level, considered representative of the entire sample (monolayer). This facilitates the analysis of the sample compared to the conventional Pap test [
19], improving the quality and interpretation of the slides, reducing the number of false negatives and inadequate samples, due to fixation defects and masking of cellularity by excess blood, mucus, inflammatory cells or other artifacts [
13,
20]. Thus, LBC outperforms traditional the Pap smear in terms of representativeness, sample fixation and smear quality.
In the quest to optimize the early detection of CC, several researchers have evaluated different methods of cervical cell segmentation and computational algorithms based on artificial intelligence (AI) to automatically analyze cytology images and thus improve CC detection [
21,
22,
23]. AI techniques help automate the analysis of cytological images to classify them into healthy or abnormal categories, as well as to detect the severity and types of lesions [
24]. AI techniques also provide quantitative and objective results that are consistent and reproducible, and are promising in terms of specificity, sensitivity and diagnostic accuracy [
25]. The use of AI in the screening of cervical lesions has potential advantages, such as greater precision, less time-consuming, a decreased need for human capital and the absence of biases due to analytical subjectivity [
26]. AI algorithms can process and analyze large sets of images, identifying patterns and inconsistencies that could be invisible to the human eye. In addition, AI-facilitated screening could decrease the need for invasive interventions, reducing patient discomfort and also improving outcomes [
23].
In this context, AI is put forward as a method that can help improve CC screening, contributing to and improving medical diagnoses, which will undoubtedly lead to a significant impact on society [
27,
28]. The aim of the present study was to evaluate the performance of a deep learning ResNet 50 model for cervical cell classification from Pap smear and liquid cytology analysis to improve the early detection of CC. To this task, the segmentation capacity of a preprocessing algorithm to obtain single-cell images that allow the training of deep learning models of diagnostic classification was designed and studied, and the diagnostic performance of this system was compared with a model trained with liquid cytology images from an external repository.
3. Results
The segmentation in the sub-images generated from the original slide-scan revealed that the application of multiple binarization thresholds in the purified cell signal, achieves in the best of cases, the selection of poorly homogeneous groups in most sectors of the slide, and the detection of large cell masses in some areas, and in very few events, single-cell segmentation (
Figure 1A). Due to the above, the single-cell patches generated in each of the sub-images are in most cases noisy with a low-definition of the cell edges, which do not allow the cell chromatin patterns to be clearly shown (
Figure 1B). By comparing the results generated in the Pap images with the segmentations of the liquid cytology (LCyt), it can be qualitatively appreciated how the generated patches clearly have single, well-defined, noise-free cells with evident nuclear patterns (
Figure 1C).
The single-cell patches generated in the Pap and LCyt datasets were used to simultaneously train a ResNet50 deep network-based classification model. Training with 9566 Pap patches, of which 3386 were labeled as malignant cell-positive, revealed medium performance, but a tendency to over-adjust when observing the change in the AUC and loss metrics through 100 epochs (
Figure 2A, validation loss increase). Likewise, when looking at the confusion matrix of the binary classification, the significant presence of false negative cases can be observed (predicted = 0, true = 1) and false positives (predicted = 1, true = 0,
Figure 2B).
From the confusion matrix, the final performance metrics detailed in
Table 1 were obtained. In these metrics, the high level of accuracy and specificity of the model stands out (0.8341 and 0.8386), although with a low level of sensitivity (0.6939).
Using the same architecture and hyperparameters of CNN-ResNet50, classifier training was evaluated, this time with single-cell patches from liquid cytology segmentation (LCyt set). Initially, the training was carried out with 50 epochs (data not shown) and as the metrics were close to convergence, the number of iterations was extended to 100 to determine whether the behavior of the model remained stable. As shown in
Figure 3A, the AUC and loss metrics throughout the ages have a more stable behavior than the Pap model and with a very low tendency to overfit, that is evidenced by the similarity between the training set and the validation set. In addition, the confusion matrix clearly demonstrated the low incidence of false negative and false positive cases in relation to true positives (
Figure 3B).
From the confusion matrix, the final performance metrics detailed in
Table 2 were obtained. In these metrics, the high level of accuracy and specificity of the model stands out (0.998 and 0.997), also with high level of precision and sensitivity (0.998 and 0.999). In addition,
Table 2 presents the comparison of the two CNN models published with the same dataset of images by Sompawong *et al* [
22] and Chen *et al* [
31], where all the ResNet50-LCyt metrics are superior.
All figures and tables should be cited in the main text as
Figure 1,
Table 1, etc.
4. Discussion
The aim of the present study was to evaluate the performance of a deep learning (DL) ResNet 50 model for cervical cell classification using Pap and LBC samples to improve the early detection of CC. The DL model used herein was trained in a binary cell classification, which is essential for the early detection of CC; identifying patients with lesions that will evolve into CC can help determine an adequate treatment strategy and may thus prevent cancer development.
CC [
1] remains one of the leading causes of cancer-related mortality in lower-income countries, despite being a highly preventable pathology. In Latin America and the Caribbean, CC is the third most common type of cancer, and HPV infection is present in >99% of cases with a worse prognosis [
32]. For decades, the standard method for detecting cervical lesions has been the Pap test [
33]. However, screening programs based on this technique in lower-income countries have rarely been successful in reducing the mortality and incidence rates of CC due to the lack of high coverage and the associated high analytical complexity [
34]. Traditional cytology begins with the collection of the sample, the analytical phase is reported, followed by colposcopy when a biopsy is necessary, and finally, the treatment of the detected lesions [
35]. Along with the lengthy and costly Pap smear process [
26], in the majority of cases, conventional smears are difficult to interpret due to the uneven distribution and overlap of cells, and the presence of blood or inflammation [
36]. In addition, the test has a high rate of false negatives; 50% of preneoplastic lesions of the cervix are missed with a single test [
37]. The occurrence of false-negative reports depends on the morphological quality of the cells and on the abnormal cells being present in the sample in a recognizable form. This is found to be difficult in the Pap smear due to its multilayer pattern distribution. As a result, a number of women with CC have a history of one or more negative cervical cytology reports when they are actually carriers of high-grade lesions. [
38]. In addition, the interobserver reproducibility of cervical cytology is very inaccurate, as previously demonstrated by Stoler and Schiffman [
39], in a study analyzing the reproducibility of 4,948 monolayer cytologic interpretations. Of the 1,473 original interpretations of atypical squamous cells (ASC-US), the second reviewer only concurred in 43.0% [
39].
The low sensitivity of the Pap smear requires repeating the test multiple times over the years for it to be effective. This is very costly and not affordable in the majority of Latin American countries. In this scenario, it is essential to develop new methods with which to detect CC earlier with low-cost and high-accuracy automated screening technologies [
40]. The alternative to the Pap smear is LBC, which makes immediate fixation easier and leaves the cells better visualized. LBC allows for a monolayer spreading where the majority of the debris, blood and exudate is removed [
38]. The benefits of these liquid-based methods include decreased obscuring materials and hemorrhage on the slide, a decrease in cellular misrepresentation, and an even cell distribution on the slide [
35]. In general, conventional screening processes with brightfield microscopes use a high human resource, and in this context, the structure of the cells to be observed is complex, since the nucleus and cytoplasm are difficult to identify due to overlapping cellular areas and undefined boundaries between neighboring cells. However, this issue can be resolved with novel computer strategies that can classify images automatically and rapidly [
38].
AI has been progressively applied in recent years for the diagnosis of various pathologies with successful results when the evidence is imaging [
41]. AI models can automatically recognize key features of images and can learn how to classify and process data using efficient algorithms [
41,
42]. Based on the above, the application of AI in the screening and early diagnosis of CHD is very useful for overcoming the current challenges of the technique [
26]. Machine learning in AI is based on several computational models, and one of these is the CNN, which is mainly used for image processing and computer vision tasks. Within this, there are also different architectures, such as Visual Geometry Group 16 (hereinafter VGG16) [
43], Residual Network 50 (hereinafter ResNet50) [
44], and Mobile Network (hereinafter MobileNet) [
45]. Previous research has shown that the CNN ResNet50 is the most effective compared with VGG16 and MobileNet; thus, this is considered the optimal CNN architecture [
46]. In addition, recent studies have demonstrated that DL models are robust against changes in the aspect ratio of cervical cells on cytological imaging [
26,
47].
As previously stated, the early detection of cervical cancer is crucial for improving treatment and patient survival [
1,
35]. AI models, specifically DL, has emerged as a promising tool for the early detection of this disease. The present study investigated the performance of the ResNet 50 DL model for classifying cervical cells in both conventional Pap smear and LBC images, with the aim of improving early cervical cancer detection [
22,
48,
50]. The findings presented herein suggest that the ResNet 50 model exhibits exceptional accuracy in classifying cervical cells from LBC images, achieving a near-perfect sensitivity, specificity, precision, accuracy and F1-score. This result indicates that the model is highly capable of accurately distinguishing between benign and malignant cells when trained on LBC images. However, the study also highlights the inherent challenges of using traditional Pap images for CC detection using AI [
16,
50]. The precise segmentation of individual cells from Pap images is hampered by cells overlapping, excessive staining and the presence of blood clots and artifacts. These factors make it difficult to accurately identify cell boundaries, leading to errors in cell classification [
21,
51]. This is reflected in the performance of the ResNet 50 model with Pap images, which, while achieving good accuracy and specificity, had a significantly lower sensitivity in detecting malignant cells. Specifically, when trained with Pap images, the ResNet 50 model achieved a sensitivity of only 69.39%, while achieving a specificity of 83.86% and accuracy of 83.4. This suggests that the model was able to detect 1351 out of 1511 healthy cells, but only 943 out of 1316 malignant cells. By contrast, when trained with LBC images, the ResNet 50 model demonstrated a much higher sensitivity of 99.8%, along with a specificity and accuracy reaching 99.7%. This indicates that the model was able to detect 3,493 out of 3,503 healthy cells and 4,169 out of 4,174 malignant cells. It is important to consider that the performance of the ResNet 50 model was negatively affected by the imbalance in the Pap dataset, with a significantly larger number of benign cell images compared to malignant ones. This imbalance makes it difficult for the model to learn discriminative patterns for identifying malignant cells. Addressing this imbalance in future research is crucial for improving the detection of malignant cells using Pap images.
LBC offers a significant advantage over Pap images due to its ability to better isolate and distribute cells, resulting in cleaner images with less overlapping and fewer artifacts [
16,
17,
20]. The segmentation artifacts in the traditional Pap smear are difficult to optimize, considering the complexity of the segmentation algorithm applied, which unlike a U-Net type algorithm for segmentation, has customized settings based on the specific nature of the sample [
52], also considering the low observational quality of the traditional cytology sample, which is not designed for the analysis of single-layer cells, such as liquid cytology [
38]. Despite the above, we believe that it could be possible to obtain a better performance in the segmentation procedure, through the direct use of a neural network of the U-Net or Attention-Unet type that then works directly with the classification model, an edge that would be interesting to address and contrast in future studies with our cohorts [
21,
22].
LBC facilitates the precise segmentation of individual cells and therefore, improves the accuracy of cell classification. Despite the higher cost associated with LBC tests compared to Pap tests, the present study highlights the importance of adopting LBC to improve the accuracy of cervical cell classification and ultimately improve the early detection of CC [
53]. When comparing the results of the present study to those of previous research using both Pap and LBC images, it was found that the ResNet 50 model used herein achieved comparable or superior performance in terms of accuracy and specificity compared to previous studies using Pap images [
22,
31]. However, previous research using Pap images generally achieved a higher sensitivity, specificity and F1 score, suggesting that the performance of ResNet 50 with Pap images may be limited by the inherent challenges of cell segmentation aforementioned. On the other hand, when compared to studies using LBC images, the ResNet 50 model consistently outperformed the reported results across all performance metrics, including sensitivity, specificity, accuracy and F1 score [
44,
54,
55]. This highlights the significant advantage of using LBC images for DL-based cervical cell classification, underscoring the potential of the ResNet 50 model as a powerful tool for improving early detection when combined with LBC.
While the present study highlights promising results with LBC images, certain limitations should be acknowledged. The imbalance of the Pap dataset, with a significantly larger number of benign cell images compared to malignant ones, negatively affected the performance of the model. To address this imbalance, future research is required to focus on techniques, such as data augmentation, using the focal loss function and resampling techniques to improve class balance. On the other hand, the model is only limited to its use in cytology for the investigation of CC; although this is a pathology of high relevance in health systems, it is also important to note that cytopathological studies are also frequently used in the investigation and diagnosis of other tumor pathologies, such as tumors of the renal and urinary tracts, head and neck tumors and airway tumors. The aforementioned is commonly known in pathology services as miscellaneous cytology and in this clinical scenario, of course, the use of segmentation and classification methods by AI has begun to be explored with successful results [
56,
57]. Despite these limitations, the findings of the present study strongly suggest that the implementation of DL models, particularly with the use of LBC, has immense potential to revolutionize CC screening. The ability of DL models to rapidly analyze massive datasets and identify subtle patterns of abnormality, surpassing human capabilities, presents a significant advantage in combating this disease. This is particularly crucial in public health systems facing a shortage of qualified personnel for analyzing cervical cytology samples.
In conclusion, the present study demonstrates that the ResNet 50 model, when trained on LBC images, is a powerful tool for the accurate classification of cervical cells and has the potential to improve the early detection of CC. The use of LBC offers significant advantages over Pap smear images, and the continued development of DL models, along with strategies to address data imbalance, has the potential to transform CC detection and improve health outcomes for patients.