Altmetrics
Downloads
87
Views
93
Comments
0
supplementary.zip (49.06KB )
Submitted:
26 September 2024
Posted:
26 September 2024
You are already at the latest version
Feature Name | Type | Description | Feature Value |
Age | Integer | The age of the patient in years. | Any positive integer value (e.g., 20, 45, 60) |
Gender | Categorical | The gender of the patient. | Male or Female |
Smoking | Categorical | Indicates whether the patient is a current smoker. | Yes or No |
Hx Smoking | Categorical | History of smoking. | Yes or No |
Hx Radiothreapy | Categorical | History of receiving radiotherapy, particularly in the head and neck area. | Yes or No |
Thyroid Function | Categorical | The functional status of the thyroid gland (e.g., Normal, Hyperthyroidism, Hypothyroidism). | Euthyroid, Clinical Hyperthyroidism, Subclinical Hypothyroidism, Clinical Hypothyroidism, or Subclinical Hyperthyroidism, |
Physical Examination | Categorical | Findings from a physical examination of the patient. | Multinodular goiter, Single nodular goiter-right, Single nodular goiter-left, Normal or Diffuse goiter |
Adenopathy | Categorical | Presence of swollen or enlarged lymph nodes, indicating potential spread of cancer. | No, Right, Bilateral, Left, Extensive, or Posterior |
Pathology | Categorical | Histopathological findings from a biopsy of the thyroid tissue (e.g., Papillary, Follicular). | Papillary, Micropapillary, Follicular, or Hurthel cell |
Focality | Categorical | Indicates whether the cancer is unifocal (single tumor) or multifocal (multiple tumors). | Uni-Focal or Multi-Focal |
Risk | Categorical | Overall risk assessment based on various factors like tumor size, lymph node involvement, etc. | Low, Intermediate or High |
T | Categorical | Tumor (T) stage in the TNM classification system, describing the size and extent of the primary tumor. | T2, T3a, T1a, T1b, T4a, T3b, or T4b |
N | Categorical | Node (N) stage in the TNM classification system, indicating lymph node involvement. | N0, N1b, or N1a |
M | Categorical | Metastasis (M) stage in the TNM classification system, indicating whether cancer has spread distantly. | M0 or M1 |
Stage | Categorical | Overall cancer stage determined by combining T, N, and M stages (I, II, III, IV). | I, II, III, IVA or IVB |
Response | Categorical | Indicates the patient's response to treatment (e.g., Complete, Partial, Stable, Progressive). | Excellent, Indeterminate, Structural Incomplete, or Biochemical Incomplete |
Recurred | Categorical | Target variable indicating whether the thyroid cancer has recurred after treatment. | Yes or No |
Dataset | Method | ARI | V-Measure | Silhouette Coefficient | PC1 Variance |
BaseData | PCA* | 0.557 | 0.451 | 0.489 | 1.200 |
tICA | 0.179 | 0.165 | 0.318 | 1.001 | |
tSVD* | 0.558 | 0.459 | 0.537 | 0.537 | |
NMF | 0.013 | 0.102 | 0.352 | 0.156 | |
UMAP | -0.076 | 0.093 | 0.604 | 2.565 | |
t-SNE | 0.258 | 0.277 | 0.362 | 22.727 | |
Isomap | 0.258 | 0.292 | 0.334 | 4.477 | |
LLE | -0.081 | 0.083 | 0.633 | 0.049 |
Model | Hyperparameter | PCA-Model Pipeline | tSVD-Model Pipeline |
RF | criterion | log_loss | entropy |
max_depth | None | None | |
class_weight | {0:1, 1:3} | {0:1, 1:3} | |
min_sample_leaf | 4 | 2 | |
sample_split | 4 | 5 | |
n_estimators | 400 | 400 | |
max_features | log2 | log2 | |
GB | criterion | squared_error | friedman_mse |
learning_rate | 0.36 | 0.35 | |
loss | Exponential | log_loss | |
max_depth | 5 | 5 | |
n_estimators | 152 | 150 | |
SVM | C | 0.12 | 0.25 |
kernel | Sigmoid | sigmoid | |
LR | C | 0.35 | 0.1 |
solver | Liblinear | Liblinear | |
max_iter | 5000 | 5000 | |
penalty | l2 | l2 | |
KNN | n_neighbors | 17 | 18 |
weight | distance | Distance | |
p | 2 | 4 | |
FNN | alpha | 1.0 | 1.0 |
activation | relu | Identity | |
hidden_layer_size | (100, 100) | (125,155) | |
learning_rate | constant | Adaptive | |
solver | sgd | Sgd | |
max_iter | 8000 | 15000 |
Model | PCA-Model Pipeline | |||||||||||
Test set Performance | 10-fold CV Performance | |||||||||||
B. Acc. | F1 score | AUC | Sen. | Spec. | Prec. | B. Acc. |
F1 Score |
AUC | Sen. | Spec. | Prec. | |
RF | 0.906 | 0.853 | 0.977 | 0.935 | 0.877 | 0.784 | 0.882 | 0.867 | 0.966 | 0.862 | 0.902 | 0.779 |
GB | 0.849 | 0.794 | 0.948 | 0.806 | 0.892 | 0.781 | 0.868 | 0.873 | 0.946 | 0.797 | 0.939 | 0.856 |
SVM | 0.929 | 0.892 | 0.992 | 0.935 | 0.923 | 0.853 | 0.873 | 0.861 | 0.960 | 0.845 | 0.902 | 0.776 |
LR | 0.952 | 0.935 | 0.992 | 0.935 | 0.969 | 0.935 | 0.850 | 0.849 | 0.967 | 0.779 | 0.920 | 0.798 |
KNN | 0.912 | 0.885 | 0.985 | 0.871 | 0.954 | 0.900 | 0.872 | 0.881 | 0.960 | 0.788 | 0.957 | 0.883 |
FNN | 0.938 | 0.896 | 0.971 | 0.968 | 0.908 | 0.833 | 0.903 | 0.897 | 0.961 | 0.871 | 0.935 | 0.855 |
tSVD-Model Pipeline | ||||||||||||
Test set Performance | 10-fold CV Performance | |||||||||||
B. Acc. | F1 score | AUC | Sen. | Spec. | Prec. | B. Acc. |
F1 Score |
AUC | Sen. | Spec. | Prec. | |
RF | 0.937 | 0.886 | 0.986 | 0.912 | 0.938 | 0.861 | 0.889 | 0.892 | 0.965 | 0.844 | 0.949 | 0.871 |
GB | 0.896 | 0.853 | 0.978 | 0.853 | 0.938 | 0.853 | 0.884 | 0.886 | 0.957 | 0.825 | 0.938 | 0.858 |
SVM | 0.928 | 0.879 | 0.992 | 0.853 | 0.963 | 0.906 | 0.846 | 0.848 | 0.961 | 0.780 | 0.916 | 0.789 |
LR | 0.944 | 0.933 | 0.994 | 0.903 | 0.985 | 0.966 | 0.854 | 0.859 | 0.965 | 0.770 | 0.931 | 0.832 |
KNN | 0.908 | 0.885 | 0.987 | 0.853 | 0.963 | 0.906 | 0.860 | 0.868 | 0.952 | 0.770 | 0.949 | 0.862 |
FNN | 0.912 | 0.903 | 0.989 | 0.824 | 1.000 | 1.000 | 0.859 | 0.870 | 0.963 | 0.762 | 0.967 | 0.906 |
Study | Model | Dataset/Features | AUC | Sen. | Spec. | Comment |
Our Study | SVM | PCA/tSVD pipelines | 99.2% | 93.5% (PCA), 85.3% (tSVD) | >92% | Confirms SVM's effectiveness; aligns with Borzooei et al. |
KNN | PCA/tSVD pipelines | >98.4% | >85% | >95% | Suggests improved predictive ability with high-variance features. | |
RF | PCA/tSVD pipelines | >97% | 93.5% (PCA), 91.2% (tSVD) | 87.7% (PCA), 93.8% (tSVD) | Consistent with Borzooei et al.; reliable performance. | |
FNN | PCA/tSVD pipelines | >97% | 96.8% (PCA), 82.4% (tSVD) | 90.8% (PCA), 100% (tSVD) | Comparable performance with Borzooei et al.’ ANN model; high spec and sen. | |
LR | PCA/tSVD pipelines | >99% | 93.5% (PCA), 90.6% (tSVD) | >96% | Superior performance to Wang et al. LR | |
Borzooei et al. (2024) [11] | SVM | 13 clinicopathologic features | 99.71 | 99.33% | 97.14% | High performance; aligns with your tSVD-based SVM (AUC: 99.2%). |
KNN | 13 clinicopathologic features | 98.44 | 83% | 97.14% | Our KNN models in PCA/tSVD pipelines show slightly higher AUC (>98%) and sensitivity (>85%). | |
RF | 13 clinicopathologic features | 99.38 | 99.66% | 94.28% | Comparable to our RF models with AUC >97% in both PCA and tSVD pipelines. | |
ANN | 13 clinicopathologic features | 99.64 | 96.6% | 95.71% | High performance comparable to our FNN model. | |
Qiao et al. (2024) [51] | RF | Distant metastasis dataset | 0.960 | 92.9% | N/A | High performance similar to our RF model's performance. |
Wang et al. (2024) [52] | RF | Larger cohort (2244 patients), perioperative variables | 0.766 | 0.757 | 0.682 | Lower performance than our study; variation may be due to different feature sets. |
LR | Larger cohort (2244 patients), perioperative variables | 0.738 | 0.865 | 0.495 | Lower performance than our LR; variation may be due to different feature sets. | |
SVM | Larger cohort (2244 patients), perioperative variables | 0.752 | 0.568 | 0.903 | Lower performance than our SVM but with comparable spec. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 MDPI (Basel, Switzerland) unless otherwise stated