1. Introduction
Gallbladder cancer (GBC), as a common malignant tumor in the biliary system, has the characteristics of concealed onset, rapid progress, early metastasis, and poor prognosis. Its incidence rate is closely related to gallstones and chronic cholecystitis[
1,
2]. Due to the high malignancy and lack of specific symptoms and signs in the early stages of gallbladder cancer, distant metastasis often occurs when the disease is detected. The 5-year survival rates of GBC patients in T3 and T4 stages are 32.4% and 3.5%, respectively[
3,
4]. At present, there is still a lack of early diagnostic methods with good specificity and sensitivity for gallbladder cancer, and most of the clinically discovered GBC are in the middle and late stages[
5]. Studies have shown that the incidence of lymph node and distant metastasis in GBC patients ranges from 17.9% to 64.5%, and the most common metastatic organs are the liver, lungs, and peritoneum[
6,
7,
8].Among GBC patients, the prognosis of patients with distant metastasis is worse than those without distant metastasis, and the one-year survival rate of GBC patients with distant metastasis is 20% -50%[
7,
9]. Research has shown that distnt metastasis is an important predictive factor for the survival of GBC patients[
10]. Early assessment of the risk of distant metastasis is crucial for early intervention and improving the prognosis of GBC patients in T1 and T2 stages of gallbladder cancer.Although Nomogram is currently the most commonly used clinical prediction model, machine learning algorithms are increasingly being applied to construct clinical models for their practicality, innovation, and accuracy[
11]. Machine learning algorithms have broad prospects in utilizing complex and massive clinical data for disease diagnosis and outcome prediction. Previous studies have shown that machine learning has more advantages than traditional big data clinical prediction research methods[
12].
Therefore, this study aims to establish a machine learning prediction model to predict the occurrence of distant metastasis in GBC patients. This study can provide clinicians with more personalized clinical decisions, improve patient prognosis through early intervention, and effectively enhance patient quality of life.
4. Discussion
In this study, we used machine learning algorithms combined with clinical pathological features to construct a predictive model for predicting distant metastasis of gallbladder cancer. Compared with previous studies, this study predicts and analyzes the distant metastasis of GBC patients by constructing a machine learning algorithm model. The results showed that based on the SEER database, by comparing the predictive performance of seven machine learning algorithms, we found that the model based on the RF algorithm performed the best and had higher predictive performance.
Although gallbladder cancer is relatively rare and its incidence rate increases slowly, it is still the most common malignant tumor in the bile duct system [
2,
14]. The treatment effect is poor when GBC progresses to the middle and late stages. The overall survival rate (OS) of GBC patients is about 17.8% -21.7%, and the OS in 5 years is only 5% [
15,
16,
17]. The 5-year survival rate of T1 stage GBC patients is as high as 95% -100%, while the 5-year survival rates of T3 and T4 stage patients are only 23% and 12% [
18]. The prognosis of GBC patients with distant metastasis is worse than that of GBC patients without metastasis, and the 1-year survival rate is between 20% -50% [
7,
9]. Therefore, exploring the risk of distant metastasis of early gallbladder cancer and establishing corresponding predictive models are crucial for early identification and clinical intervention of distant metastasis of gallbladder cancer, thereby improving prognosis. At present, research on distant metastasis of gallbladder cancer mainly focuses on exploring disease prognosis, and mostly relies on nomograms established based on traditional LR models or COX competitive risk models [
6,
19,
20]. The traditional logistic regression model evaluates the association between risk factors and specific outcomes, and reflects the strength of the relationship between risk factors and outcomes by generating corresponding coefficients. At the same time, logistic regression models also have some shortcomings, such as being sensitive to multicollinearity and lacking mechanisms to prevent overfitting [
21]. With the continuous progress of artificial intelligence technology, the application of ML models in tumor diagnosis and prognosis assessment is becoming increasingly common [
22,
23]. The ML algorithm also compensates for the shortcomings of traditional logistic regression models, such as overfitting and imbalanced data distribution [
24]. In this study, we applied the ML algorithm for the first time to predict distant metastasis of T1 and T2 stage gallbladder cancer, with the aim of effectively improving patient prognosis through early intervention.
The aim of this study is to construct a machine learning model to predict the distant metastasis of T1 and T2 stage gallbladder cancer patients, and to predict the relevant factors affecting the distant metastasis of GBC patients through logistic regression analysis.
Univariate and multivariate logistic regression analysis showed that age, history, tumor size, T stage, N stage, and grade were all predictive factors for distant metastasis of gallbladder cancer,This is consistent with previous research findings [
6]. Similar to the results presented by logistic regression,The feature importance of the RF model also indicate that grade is a key predictive variable for evaluating distant metastasis of gallbladder cancer.Tumor grade is an indicator used to evaluate the similarity of morphological and functional features between tumor cells and source organ tissues [
25].
Previous studies have also found that grade plays an important predictive role in the distant metastasis and prognosis of gallbladder cancer patients [
6,
7,
20]. The higher the grade, the poorer the cell differentiation, while higher grades typically have higher invasiveness, a wider range of infiltration, and are more prone to distant metastasis [
20].
Studies have shown [
26] that poorly differentiated GBCs are more likely to undergo distant metastasis, which is similar to the conclusion of this study.Lymph node status is a commonly used predictive factor for evaluating the metastasis and prognosis of gastrointestinal malignant tumors [
27,
28], and a thorough evaluation of lymph node status is also a necessary condition for patient treatment [
29,
30]. This study found that N stage is an important factor in predicting the occurrence of distant metastasis in gallbladder cancer. LR regression shows that when lymph node metastasis is detected, the probability of GBC developing distant metastasis is higher. This study found that gallbladder cancer patients with tumors larger than or equal to 2cm are more likely to develop distant metastasis, which is consistent with previous research results [
6].
ML can use computers to mimic human learning abilities and improve its performance by rebuilding data analysis models [
31], In the past decade, machine learning algorithms have been widely applied in the medical field and have achieved remarkable results in the diagnosis, treatment, and prognosis of diseases [
32]. Compared with traditional data analysis methods, machine learning has significant advantages. On the one hand, it can process large datasets more efficiently; On the other hand, machine learning can handle nonlinear data more reasonably through different algorithms and statistical models, while traditional methods may not achieve satisfactory expected results when dealing with nonlinear data. In many studies [
13], the predictive performance of machine learning is superior to traditional methods. In this study, RF is one of the effective machine learning models. The RF model adopts advanced classification decisions and different weighting ratios, which not only outperforms other technologies in processing large amounts of features and highly nonlinear data, but also improves the utilization of analytical information, thereby constructing a prediction model with better predictive performance [
12].
We constructed 7 predictive models based on the SEER database to evaluate the distant metastasis of T1 and T2 gallbladder cancer patients. The 7 algorithm models were evaluated by accuracy, precision, recall, F1 score, and AUC value Amongst them, RF has good predictive ability (AUC=0.913, F1 score=0.836). The RF algorithm is the best model for predicting distant metastasis of gallbladder cancer using the SEER database.
This study also has some limitations: 1) As it is based on North American demographic data, it needs to be validated with external populations in future studies. 2) The efficiency of this model is expected to be further improved, and more risk factors can be incorporated in the future. 3) The SEER database lacks important information such as tumor family history and bilirubin,as well as tumor markers, which may also be important predictive factors for distant cancer metastasis. In response to the above issues, we will collect more information and conduct in-depth supplementary research in future research.