Altmetrics
Downloads
98
Views
43
Comments
0
A peer-reviewed article of this preprint also exists.
supplementary.zip (14.76MB )
Submitted:
07 October 2024
Posted:
08 October 2024
You are already at the latest version
No. of failures | Active compounds | Inactive compounds |
---|---|---|
0 | 46 | 649 |
1 | 27 | 127 |
2 | 31 | 52 |
3 | 34 | 34 |
4 | 0 | 40 |
5 | 0 | 2 |
No. | Regression algorithm | Descriptor set | Feature selection method | CCC (nested CV, n = 5) mean (s.d.) |
R2 (nested CV, n=5) mean (s.d.) |
RMSE (n=5) mean (s.d.) |
---|---|---|---|---|---|---|
1 | Random forest (“ranger”) | MACCS | “cmim” | 0.837 0.840 0.846 0.833 0.825 0.836 (0.008) |
0.707 0.716 0.745 0.698 0.720 0.717 (0.018) |
0.872 0.867 0.835 0.884 0.877 0.867 (0.019) |
2 | XGboost | MACCS | Boruta | 0.848 0.861 0.840 0.850 0.848 0.849 (0.008) |
0.726 0.744 0.725 0.701 0.741 0.727 (0.017) |
0.848 0.821 0.873 0.870 0.835 0.849 (0.022) |
3 | Random forest (“ranger”) | MACCS | Boruta | 0.835 0.827 0.833 0.826 0.834 0.831 (0.004) |
0.712 0.696 0.722 0.679 0.707 0.702 (0.016) |
0.890 0.903 0.877 0.916 0.886 0.891 (0.018) |
4 | Support vector machines | MACCS | Boruta | 0.857 0.853 0.857 0.843 0.845 0.851 (0.007) |
0.743 0.740 0.754 0.708 0.738 0.737 (0.017) |
0.832 0.831 0.815 0.872 0.839 0.838 (0.021) |
5 | Gradient boosting machine (“GBM”) | Set2 | Boruta | 0.858 0.820 0.827 0.830 0.829 0.833 (0.015) |
0.752 0.681 0.702 0.696 0.667 0.700 (0.032) |
0.815 0.942 0.912 0.915 0.926 0.902 (0.050) |
6 | Support vector machines | Set2 | “jmim” | 0.840 0.841 0.850 0.839 0.841 0.842 (0.004) |
0.734 0.727 0.756 0.728 0.746 0.738 (0.012) |
0.854 0.850 0.824 0.849 0.838 0.843 (0.012) |
7 | BART | Set2 | Gaselect | 0.846 0.848 0.854 0.850 0.845 0.849 (0.004) |
0.730 0.739 0.745 0.733 0.689 0.727 (0.022) |
0.858 0.833 0.830 0.847 0.864 0.846 (0.015) |
8 | Random forest (“ranger”) | Set2 | Gaselect | 0.827 0.830 0.823 0.830 0.818 0.826 (0.005) |
0.733 0.730 0.708 0.742 0.723 0.727 (0.013) |
0.848 0.849 0.864 0.850 0.874 0.857 (0.011) |
9 | XGboost | Set2 | “jmim” | 0.832 0.843 0.832 0.830 0.823 0.832 (0.007) |
0.724 0.724 0.706 0.684 0.705 0.709 (0.017) |
0.868 0.852 0.885 0.890 0.901 0.879 (0.019) |
10 | BART | Set2 | Boruta | 0.867 0.825 0.825 0.834 0.829 0.836 (0.018) |
0.764 0.690 0.697 0.707 0.674 0.704 (0.034) |
0.797 0.929 0.917 0.901 0.919 0.893 (0.054) |
11 | Rule- and instance-cased regression | Set2 | Gaselect | 0.837 0.821 0.835 0.843 0.823 0.832 (0.009) |
0.724 0.707 0.717 0.727 0.660 0.707 (0.027) |
0.860 0.891 0.881 0.851 0.893 0.875 (0.019) |
12 | Support vector machines | Set2 | Gaselect | 0.849 0.853 0.840 0.856 0.851 0.850 (0.006) |
0.748 0.754 0.715 0.766 0.757 0.748 (0.020) |
0.821 0.804 0.846 0.804 0.819 0.819 (0.017) |
13 | Random forest (“ranger”) | Set3 | Gaselect | 0.812 0.832 0.826 0.826 0.823 0.824 (0.007) |
0.702 0.720 0.727 0.731 0.717 0.719 (0.011) |
0.873 0.841 0.860 0.862 0.876 0.862 (0.014) |
14 | BART | Set4 | “jmim” | 0.864 0.845 0.858 0.853 0.852 0.854 (0.007) |
0.751 0.710 0.742 0.731 0.730 0.733 (0.015) |
0.821 0.888 0.844 0.858 0.856 0.853 (0.024) |
15 | Weighted k-Nearest Neighbor | Set4 | Boruta | 0.826 0.854 0.865 0.848 0.858 0.850 (0.015) |
0.690 0.740 0.739 0.709 0.737 0.723 (0.022) |
0.923 0.846 0.821 0.874 0.858 0.864 (0.038) |
16 | BART | Set4 | Gaselect | 0.856 0.847 0.845 0.846 0.859 0.851 (0.006) |
0.743 0.726 0.715 0.719 0.745 0.730 (0.014) |
0.833 0.865 0.871 0.877 0.848 0.859 (0.018) |
17 | XGboost | Set4 | “jmim” | 0.835 0.832 0.856 0.830 0.846 0.840 (0.011) |
0.701 0.702 0.750 0.705 0.737 0.719 (0.022) |
0.857 0.896 0.825 0.902 0.844 0.865 (0.033) |
18 | Random forest (“ranger”) | Set4 | Boruta | 0.831 0.846 0.857 0.855 0.847 0.847 (0.010) |
0.702 0.748 0.754 0.749 0.738 0.738 (0.021) |
0.861 0.835 0.796 0.829 0.848 0.834 (0.024) |
19 | Rule- and instance-cased regression | Set4 | Gaselect | 0.851 0.813 0.841 0.842 0.837 0.837 (0.014) |
0.726 0.655 0.721 0.715 0.688 0.701 (0.030) |
0.859 0.947 0.882 0.879 0.896 0.893 (0.033) |
20 | BART | Set4 | Boruta | 0.856 0.870 0.868 0.872 0.877 0.869 (0.008) |
0.743 0.757 0.743 0.760 0.768 0.754 (0.011) |
0.833 0.814 0.836 0.805 0.800 0.818 (0.016) |
21 | XGboost | Set4 | Boruta | 0.850 0.849 0.850 0.855 0.843 0.849 (0.004) |
0.737 0.747 0.734 0.738 0.711 0.733 (0.013) |
0.848 0.834 0.838 0.842 0.893 0.851 (0.024) |
Ensemble algorithm | CCC (nested CV) | R2 (nested cross-validation) | RMSE (nested cross-validation) |
---|---|---|---|
Support vector machines | 0.893 | 0.798 | 0.730 |
BART | 0.888 | 0.789 | 0.745 |
KKNN | 0.887 | 0.789 | 0.750 |
Random forests | 0.889 | 0.794 | 0.739 |
Xgboost | 0.883 | 0.784 | 0.760 |
Model whose features were randomized | CCC (nested CV, n = 20) mean (s.d.) |
Rr2 (nested CV, n=20) mean (s.d.) |
RMSE (n=20) mean (s.d.) |
Rp2 (for the corresponding model) |
---|---|---|---|---|
Model 19 in Table 2 | 0.047 (0.055) | -0.220 (0.077) | 1.804 (0.045) | 0.803 |
Model 17 in Table 2 | -0.007 (0.034) | -0.113 (0.068) | 1.731 (0.027) | 0.773 |
Model 20 in Table 2 | 0.078 (0.060) | -0.056 (0.040) | 1.685 (0.032) | 0.781 |
MACCS Key | Structural Pattern | Association |
---|---|---|
62 | "A$A!A$A" (any atom – ring bond – any atom – chain bond – any atom – ring bond – any atom) | Positive |
85 | CN(C)C (a closed ring formed by a C-N-C chain) | Positive |
105 | "A$A($A)$A" (aromatic atom – substructure – aromatic atom) | Negative |
22 | Three-membered ring system (3M ring) | Relatively strongly negative |
65 | Carbon and nitrogen united by an aromatic query bond | Positive |
145 | 6M RING > 1 (more than one six-member rings) | Positive |
89 | OAAAO (two oxygen atoms connected by three other atoms) | Positive |
97 | NAAAO (a nitrogen atom connected by a sequence of four single bonds to an oxygen atom) | Weakly negative |
107 | XA(A)A (where X is a halogen and A any atom) | Weakly positive |
42 | F (a fluorine atom) | Weakly positive |
Descriptor | Correlation coefficient (for other descriptors) | Correlated Descriptors | Activity Relationship |
---|---|---|---|
MATS3e (Moran autocorrelation of lag 3 weighted by Sanderson electronegativity) |
r = 0.846 |
MATS3s (Moran autocorrelation of lag 3 weighted by I-state) |
Negative values → higher activity |
SpMax_B(p) (Leading eigenvalue from Burden matrix weighted by polarizability) |
r >0.91 r>0.80 |
SpDiam_B(p) (Diameter from Burden matrix weighted by polarizability) SpMax1_Bh(p) (Leading eigenvalue n. 1 of Burden matrix weighted by polarizability) piPC06 (molecular multiple path count of order 6) SpDiam_B(v) ( spectral diameter from Burden matrix weighted by van der Waals volume) SpMax_B.v. |
Inverted U-shape |
VE1sign_B(s) (Coefficient sum of the last eigenvector from Burden matrix weighted by I-State) |
N/A |
None |
Higher values → lower activity |
SpMin1_Bh(e) (Smallest eigenvalue n. 1 of Burden matrix weighted by Sanderson electronegativity) |
r = 0.99 r>0.87 -0.80 |
SpMin1_Bh(i) (Smallest eigenvalue n. 1 of Burden matrix weighted by ionization potential) SpMin1_Bh(v) (Smallest eigenvalue n. 1 of Burden matrix weighted by van der Waals volume) SpMin1_Bh(p) (Smallest eigenvalue n. 1 of Burden matrix weighted by polarizability) WiA_D/Dt (average Wiener-like index from distance/detour matrix) |
Negative association with an asymmetric inverted U-shape |
SM3_X (Spectral moment of order 3 from chi matrix) |
r > 0.90 r=0.81 |
nR03 (Number of 3-membered rings) D/Dtr03 (Distance/detour ring index of order 3) SRW03 (Self-returning walk count of order 3) SM5_X (Spectral moment of order 5 from chi matrix) B04[N-S] (Presence/absence of N – S at topological distance 4) B06[O-S] (Presence/absence of O – S at topological distance 6) F06[O-S] (Frequency of O – S at topological distance 6) |
Negative correlation with pIC50 |
GATS5v (Geary autocorrelation of lag 5 weighted by van der Waals volume) |
r = -0.903 r = 0.80 |
MATS5p (Moran autocorrelation of lag 5 weighted by polarizability) GATS5p (Geary autocorrelation of lag 5 weighted by polarizability) |
Increasing values → higher activity |
MATS1p (Moran autocorrelation of lag 1 weighted by polarizability) |
r = 0.93 r = 0.87 |
MATS1v (Moran autocorrelation of lag 1 weighted by van der Waals volume), MATS1i (Moran autocorrelation of lag 1 weighted by ionization potential) | Inverted U-shaped relationship with activity |
JGI5 (Mean topological charge index of order 5) |
NA |
None |
Higher values → higher inhibitory activity |
TI2_L (Second Mohar index from Laplace matrix) |
r > 0.8 for all but none > 0.9 |
MSD (Mean square distance index (Balaban)) AECC (Average eccentricity) DECC (Eccentric) ICR (Radial centric information index) MaxTD (Max topological distance) S3K (3-path Kier alpha-modified shape index) IDE (Mean information content on the distance equality) HVcpx (Graph vertex complexity index) WiA_Dz(Z) (Average Wiener-like index from Barysz matrix weighted by atomic number) SpPosA_Dz(Z) (Normalized spectral positive sum from Barysz matrix weighted by atomic number) SpMaxA_Dz(Z) (Normalized leading eigenvalue from Barysz matrix weighted by atomic number) SpMAD_Dz(Z) (Spectral mean absolute deviation from Barysz matrix weighted by atomic number) WiA_Dz(m) (Average Wiener-like index from Barysz matrix weighted by mass) SpPosA_Dz(m) (Normalized spectral positive sum from Barysz matrix weighted by mass) SpMaxA_Dz(m) (Normalized leading eigenvalue from Barysz matrix weighted by mass) SpMAD_Dz(m) (Spectral mean absolute deviation from Barysz matrix weighted by mass) WiA_Dz(v) (Average Wiener-like index from Barysz matrix weighted by van der Waals volume) SpPosA_Dz(v) (Normalized spectral positive sum from Barysz matrix weighted by van der Waals volume) SpMaxA_Dz(v) (Normalized leading eigenvalue from Barysz matrix weighted by van der Waals volume) SpMAD_Dz(v) (Spectral mean absolute deviation from Barysz matrix weighted by van der Waals volume) WiA_Dz(e) (Average Wiener-like index from Barysz matrix weighted by Sand |
Higher values → lower inhibitory activity |
Descriptor | Correlated Descriptors | Correlation coefficient(s) | Activity Relationship |
---|---|---|---|
C-034 (R–CR..X) | nPyrroles (number of pyrrole rings), N-073 (Ar2NH / Ar3N / Ar2N-Al / R..N..R), SaasN (sum of aasN E-states), NaasN (number of atoms of type aasN) | R=0.89 – 0.90 |
Higher values → higher activity |
SHED_AA (Shannon entropy descriptor, acceptor-acceptor) | SHED_DA (Shannon entropy descriptor, acceptor-acceptor) | r=0.91 | Lower values → higher activity |
C-003 (a CHR3 group) | nCt (number of total tertiary C), nCrt (number of ring tertiary C) | r=0.88 - 0.99 | ≤3 → lower activity, 4 or 5 → higher activity |
nCrt (number of ring tertiary C) | nCt, C-003, SpMin1_Bh(s) (smallest eigenvalue n. 1 of Burden matrix weighted by I-state) | 0.80 – 0.88 | 0 → higher activity, ≥1 → lower activity |
CATS2D_04_AA (CATS2D Acceptor-Acceptor at lag 04) | F04[O-O] (Frequency of O – O at topological distance 4) | r=0.81 | ≥3 → Stronger activity |
NsF (number of atoms of type sF, i.e. -F) | nF (number of fluorine atoms), nX (number of halogen atoms), P_VSA_e_6 (P_VSA-like on Sanderson electronegativity, bin 6), F-084 (F attached to C1(sp2)), SsF (sum of sF E-states), NsF (number of atoms of type sF), F01[C-F] (frequency of C – F at topological distance 1), F02[C-F] (frequency of C – F at topological distance 2), F03[C-F] (frequency of C – F at topological distance 3), F07[C-F] (frequency of C – F at topological distance ), F08[C-F] (frequency of C – F at topological distance 8) | r>0.9 or r=1.0 | Fluorinated → higher activity |
CATS2D_04_DA (CATS2D Donor-Acceptor at lag 04) | CATS2D_04_DD, F04[O-O] | r > 0.80 | Higher values → slightly higher inhibition |
SHED_AN (Shannon entropy descriptor, acceptor-negative) | SHED_DN, CATS2D_01_DN (CATS2D Donor-Negative at lag 01), CATS2D_00_NN (CATS2D Negative-Negative at lag 00, i.e. number of negative atoms) | r>0.90 | Higher values → slightly lower activities |
CATS2D_02_AL (CATS2D acceptor-lipophilic at lag 02) | F04[O-O] | r = 0.84 | Higher values → slightly higher inhibition |
CATS2D_09_DL (CATS2D Donor-Lipophilic at lag 09) | CATS2D_02_DL, CATS2D_07_DL, CATS2D_08_DL | r > 0.80 | Lower values → higher inhibitory activity |
No. | Compound | IC50* (μM) | IC50** (μM) |
---|---|---|---|
Isoflavonoids | |||
1 | irigenin (5,7,3'-trihydroxy-6,4',5'-trimethoxyisoflavone) | 0.56 | 1.37 |
2 | tectoridin (shekanin; 4',5-dihydro-6-methoxy-7-(o-glucoside)isoflavone) | 0.84 | 0.72 |
3 | irisolidone (4'-O-methyltectorigenin) | 0.53 | 1.24 |
4 | iristectorin A | 0.89 | 0.82 |
5 | iristectorigenin B | 0.54 | 1.12 |
6 | homotectoridin | 0.87 | 0.70 |
7 | germanaism A | 0.52 | |
8 | irilone 4'-O-glucoside | 0.53 | 0.73 |
9 | germanaism B | 0.64 | 0.80 |
10 | germanaism A | 0.52 | 0.95 |
11 | Kakkalidone (irisolidone 7-O-beta-D-glucoside and its stereoisomers) | 0.59 | 0.75 |
12 | homotectoridin | 0.87 | |
13 | irisflorentin | 1.73 | 0.80 |
14 | pratensein 7-O-glucopyranoside | 2.08 | 0.82 |
15 | germanaism G | 2.34 | 0.82 |
16 | 3-(3-hydroxy-4,5-dimethoxyphenyl)-7-[(2S,3R,4S,5S,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxychromen-4-one | 1.24 | 0.69 |
17 | 5-hydroxy-3-(3-hydroxy-4,5-dimethoxyphenyl)-7-[(2R,3S,4R,5R,6S)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxychromen-4-one | 1.41 | 0.78 |
18 | germanaism D | 2.44 | 0.85 |
flavonoids | |||
19 | isoswertiajaponin | 0.83 | 0.97 |
20 | swertisin (flavocommelitin, 6-C-glucopyranosyl-7-O-methylapigenin) | 1.24 | 0.84 |
21 | isoswertisin (isoflavocommelitin, 7-O-methylvitexin) | 1.07 | 0.85 |
22 | embigenin | 1.30 | 0.66 |
terpenoids | |||
23 | iriflorentan (2Z-2-[(2R,3S,4S)-4-hydroxy-3-(hydroxymethyl)-2-(3-hydroxypropyl)-4-methyl-3-[(3E,5E)-4-methyl-6-[(1R,3S)-2,2,3-trimethyl-6-methylidenecyclohexyl]hexa-3,5-dienyl]cyclohexylidene]propanal) | 0.62 | 1.51 |
24 | germanical C (2-[4-hydroxy-3-(hydroxymethyl)-2-(3-hydroxypropyl)-4-methyl-3-[4-methyl-6-(2,5,6,6-tetramethylcyclohex-2-en-1-yl)hexa-3,5-dien-1-yl]cyclohexylidene]propanal) | 0.76 | 1.65 |
25 | irisgermanical B (2-[4-hydroxy-3-(hydroxymethyl)-2-(3-hydroxypropyl)-4-methyl-3-[4-methyl-6-(2,2,3-trimethyl-6-methylidenecyclohexyl)hexa-3,5-dien-1-yl]cyclohexylidene]propanal) | 0.62 | 1.51 |
xanthonoids | |||
26 | mangiferin | 1.68 | 0.91 |
27 | irisxanthone | 1.84 | 0.98 |
28 | isomangiferin | 2.49 | 0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
Marcos Lorca
et al.
,
2018
© 2024 MDPI (Basel, Switzerland) unless otherwise stated