3.2. Surface Plasmon Resonance (SPR) Binding Data
In preliminary rounds of assay development and initial data collection on the surfaces described above, small-molecule analytes were analyzed using running buffer composed of PBS with 0.05% Tween 20 and either a 1% or 2% DMSO solution. Titrations of each analyte were performed with 200 μM as the highest concentration for a 2-fold serial dilution of 10 concentrations. The best representative data for compounds
1,
1b,
1d, and
2 that fits to the steady state affinity model are shown in
Table 1. Some of the best representative data showing simultaneous SPR sensorgrams binding to Spike and S2 with good quality fits are shown in
Figure 4. For SPR data presented in
Figure 4, all compounds demonstrate definitive sensorgram evidence for binding from concentration dependent change in response units (RU) and good quality fits to the steady state affinity models.
Figure 4 demonstrates compounds of four different structural classes that bind to the S2 segment as well as the full-length Spike protein. While some of the lower affinity compounds
1c,
1d, and
4 exhibited greater observed differences in affinities between S2 and Spike, the higher affinity reference compounds
1,
2, and
3, all had reproducibly similar SPR sensorgrams and fits with lower differences in affinity between S2 and Spike, so for
1 (S2 = 5.9 μM) (Spike = 7.4 μM), for
2 (S2 = 6.3 μM) (Spike = 5.4 μM), and for
3 (S2 = 4.1 μM) (Spike = 4.1 μM). This binding data strongly suggests that the binding site of the small-molecules is found on the S2 segment of the full-length protein.
The observed affinities in this round (K
d = 5.9 μM to 7.4 μM) for Arbidol
1, were well within the range of reported antiviral activities for Arbidol
1 (EC50s = 4.1 μM to 10.0 μM) reported in the literature
1,2 and from our previous work (EC50 = 5.6 μM) [
20]. For
1 and derivatives,
1c and
1d, it was generally observed that the affinity for binding to the S2 segment was slightly more favorable than to the full-length Spike protein, as observed in each individual compound. This is shown in
Figure 2C where
1 binds with a slightly higher affinity to the S2 segment (K
d = 5.9 μM) compared to full length Spike (K
d = 7.4 μM) shown in
Figure 2B. The same trend (K
d S2 < K
d Spike) can be shown for derivatives
1c and
1d in
Figure 4A and
Figure 4B respectively. While the observed trend of (K
d S2 < K
d Spike) was not necessarily expected, it may be possible to rationalize the observation if the cleaved S2 segment small-molecule binding sites may be more dynamic or amenable to complementary induced-fit binding compared to the much larger full-length trimer.
An important observation was that comparing the SPR data for either the full-length Spike, or the S2 segment, the binding data exhibited the expected structure-activity-relationship (SAR) for the derivative series such that for Spike (
1 <
1c) and (
1 <
1d) [
20]. Thus,
1 had a higher affinity than either derivative
1c or
1d as expected from both virtual screening data and experimental CPE data for
1 (EC50 = 5.6 μM) and
1b (EC50 = 29.5 μM) [
20]. In addition, for only the data on the S2 segment the same was observed,
1 had a higher affinity than either derivative
1c or
1d as expected, providing additional confidence in the interpretation of the SPR data observed for different compounds. The observed affinity (K
d = 4.1 μM) for Toremifene
3, was very close to the reported antiviral activity for Toremifene
3, in live SARS-CoV-2 infections (EC50 = 3.58 μM) [
47] and (EC50 = 1.92 μM) for SARS-CoV-2 pseudovirus entry assays [
48]. Observing the expected SAR for (
1,
1c and
1d) and being in reasonable agreement with reported antiviral activity for
1 and
3 helps to establish that we are able to interpret the SPR results from more than one perspective.
In a final round of compound characterization, a new SA chip was prepared aiming to facilitate collection of duplicate sensorgrams using optimized conditions and 2% DMSO running buffer. Independent duplicates were compared for each compound performing two separate concentration series with a starting concentration of 100 μM and 50 μM respectively. This dataset resulted in five compounds with quality duplicates binding to the S2 segment. The best SPR duplicates for binding to the S2 segment are shown in
Figure 5. The best representative data for compounds
1,
1c,
2,
3, and
4 and fits to the steady state affinity model are summarized in
Table 2, statistics are presented for duplicates. While the duplicate affinity of
1 binding to S2 as reported in
Table 2 is slightly lower (K
d = 2.1 ± 0.2 μM) in this dataset than (K
d = 5.9 μM) as reported in
Table 1, it remains much lower than the derivative
1c (K
d = 11.4 ± 1.3 μM). Thus, the binding data in the duplicates also follow the expected SAR for the derivative series such that for binding to S2,
1 had a higher affinity than derivative
1c as expected [
20]. Again, in this dataset of duplicate measurements, the observed affinity for Spike (K
d = 4.1 mM) for Toremifene
3, were very close to several reported antiviral activities (EC50s = 1.9 μM to 3.6 μM) [
47,
48].
As mentioned previously, Clofazimine
2, was found to provide the most reliable and highest quality SPR binding data to S2 over the conditions explored. In a recently published series of Clofazimine derivatives, the affinity of
2 to full-length Spike by SPR was found to be (Spike = 3.82 μM) [
23], very close to the best representative data for 2 (S2 = 3.9 μM) (Spike = 2.9 μM) from the first round as presented in
Table 1. The best duplicate data for 2 (S2 = 6.5 ± 0.3 μM) (Spike = 4.6 ± 1.2 μM) is presented in
Table 2. While the affinity is a bit lower in the duplicate dataset, both data sets show that Clofazimine 2 definitively binds to the S2 segment under these conditions.
While Clofazimine
2, has been reported to have a range of potent antiviral activity in CPE assays ranging from (EC50 ~ 0.08 μM to 0.56 μM) [
4,
5,
6,
7] with a consensus of (EC50 = 0.31 μM) [
4,
5], the antiviral activity of
2 has been reported to result as a combination of Spike-dependent fusion inhibitor activity as well as Nsp13 helicase unwinding activity [
5]. The dose-response curves of
2 in the same study [
5], suggest that the micromolar viral fusion activity (EC50 ~ 2.5 μM to 5.0 μM) is slightly more potent than the Nsp16 helicase unwinding activity (EC50 ~ 7.5 μM to 10 μM) [
5].
Interestingly, in other independent assays that report SARS-CoV-2 Spike-mediated fusion activity [
49], the activity values are also lower potency for Clofazimine
2 (EC50 = 2.56 μM) [
49], which is much closer to the current measured affinity for full length Spike (K
d = 2.9 μM to 4.6 μM) by SPR and another reported value (K
d = 3.82 μM) for full length Spike by SPR [
23]. In summary, while the observed affinity by SPR binding assay for Clofazimine
2 is in the micromolar range (K
d = 2.9 μM to 4.6 μM) rather than the more potent observed antiviral activity (EC50 = 0.31 μM) [
4,
5], this is in agreement with observations from viral fusion assays [
5,
49], SPR binding [
23], and the concept that the resulting antiviral activity is a result of dual-targeted drug action on at least Spike and the Nsp13 helicase [
5]. While Clofazimine has been reported to be a viral fusion inhibitor, to our knowledge it has yet to be reported that Clofazimine binds to the S2 segment of Spike. As Clofazimine is an important clinical candidate, narrowing down its mode of action as a direct-acting fusion inhibitor is important. The SPR data show that Clofazimine 2 binds to a well-formed binding site on the S2 segment trimer.
3.3. Predicting the Clofazimine Binding Site on S2 with Molecular Docking
While the Arbidol 1 binding site has been experimentally determined by Shuster et al [
31], there still has yet to be any published experimental structure of a small-molecule fusion inhibitor bound to the Spike S2 segment solved by either X-ray crystallography or CryoEM techniques. From docking
2 into all the TOP50 binding sites predicted on the S2 segment [
19], the top two favorable sites were identified and shown in
Figure 6A. Site 2 shown in (
Figure 3B), is the only feasible binding site for
2 according to our modeling data (
Figure 6B), where Site 1, is predicted to be much less thermodynamically favorable for binding of 2. From analysis of molecular docking and calculated (ΔG
bind) values at all 50 sites[
19,
20], Site 2 is easily identified as being the most favorable site, also from the identification of two other structurally related 3-fold symmetric sites. In terms of predicted (ΔG
bind) values from the statistics of the top-ranked cluster (as a triplicate), Site 2 (ΔG
bind = -9.4 ± 0.3 kcal/mol) is much more thermodynamically favorable than Site 1 (ΔG
bind = -7.6 ± 0.2 kcal/mol), the Arbidol 1 binding site. The protein-ligand interactions of Clofazimine
2, modeled at Site 2, are complementary and favorable as described in more detail in the next section. In summary, as shown from docking and predicted (ΔG
bind) values (
Figure 6B), Clofazimine
2 and
3 are predicted to bind more favorably at Site 2 compared to Site 1. Ecliptasaponin A
4 is predicted to bind favorably at Site 1, the Arbidol binding site (
Figure 6A). We have previously demonstrated how a series of oleanolic acid (OA) Saponin derivatives are best modeled at Site 1 rather than Site 2 on the S2 segment [
20], and Ecliptasaponin A
4 is closely related in structure to these OA Saponin derivatives such as
12a. Both
4 and
12a are predicted to bind more favorably at Site 1, the Arbidol 1 binding site as shown in
Figure 6.
3.4. Modeling a Series of Clofazimine Derivatives Binding to the S2 Segment
Beyond the fact that Clofazimine 2 is predicted to bind much more favorably to Site 2 than Site 1 according to calculated (ΔGbind) values, another independent line of evidence from modeling also strongly corroborates Site 2. Recently, a new series of chemical derivatives of Clofazimine 2, were published with antiviral activity data against SARS-CoV-2.23 Using the same methods, 18 derivatives were modeled binding to Site 1 and Site 2. For each derivative in the series, a TOP-ranked cluster was determined independently from docking numerous initial starting conformations, rather than simply modeling all derivatives exactly as the binding mode of the reference compound.
In modeling the series of 18 derivatives at Site 2, the “untrained” predicted ΔG
bind values exhibited some correlation with the experimental EC50 values. The Pearson’s R
2 correlation coefficient was R
2 = 0.264 for all 18 compounds (
Figure 7A) modeled at Site 2. In comparison, 18 compounds (
Figure 7A) modeled at Site 1 exhibited a positive correlation but with a very low calculated correlation coefficient R
2 = 0.029. Thus, from modeling all 18 compounds the “untrained” predicted ΔG
bind values had much greater correlation at Site 2 (R
2 = 0.264) compared to Site 1 (R
2 = 0.029). Compared to previous benchmark studies characterizing this scoring function method and performance against datasets of diverse protein binding site architechtures and protein-ligand interactions, these levels of R
2 correlation are adequate to establish confidence in the binding model as reflecting experimental SAR data [
20,
35,
36] compared to models with zero correlation (R
2 = 0.0). While the robustness of this correlation analysis may be determined rigorously using a cross-validation approach [
20], this is not required in this situation, as the series may also be easily modeled as two separate series of derivatives. One series is structurally related to reference compound Clofazimine
2 (
6d,
6e,
7a,
7b,
7c,
7d,
7e,
7f,
7g,
7i,
7k,
7m,
7o) and the other series is based on a different reference compound substructure (
15a,
15b,
15f,
15b,
15h). Pearson’s R
2 correlation values range from R
2 = 0.247 for all 18 compounds (
Figure 7A) modeled at Site 2 where even higher correlation coefficients of R
2 = 0.306 to 0.311 were achieved modeling the data set as these two separate series of “untrained” predicted ΔG
bind rankings as shown in (
Figure 7B) with similar slopes. Compared to previous studies using this approach, the observed R
2 correlation and slope for the two series are sufficiently similar [
20,
35,
36].
As shown in
Figure 8 in more detail, the derivatives from both series are well modeled at Site 2 and the binding mode can rationalize the SAR functional group substitutions at all three R groups (
R,
R1 and
R2). The model can rationalize the SAR relationship at
R where (O-CH
3 > Cl > F). The reference Cl atom forms not hydrophobic interactions, but rather close and favorable hydrophilic interactions with the positively charged NZ atom from the side chain of K1038 where the phenyl ring forms favorable hydrophobic interactions with the hydrophobic side chain of K1038 atoms (CB, CG, CD, CE). Thus, the
R group is partially solvent exposed in close proximity of electrostatic interactions with the NZ atom side chain of K1038. The substitution O-CH
3 forms favorable interactions, but the F atom exhibits a weaker molecular interaction with a positively charged NZ atom than Cl. Thus, the model is able to rationalize the most important substitutions leading to favorable
R groups.
Next, the model can explain the series of substitutions at R1, where the phenyl ring is buried in a hydrophobic pocket formed primarily by the side chain of A890 and Y1047 on one side and V1040 on the other side. For the position of the R1, para Cl or F substitutions are found in more favorable derivatives such as 15g. The favorability of F over Cl is easily rationalized by its proximity at the back of a hydrophobic pocket with close interactions with dipolar backbone atoms 2.54 Å from (G1046@HN) and 3.19 Å from (D141@OD1). Other R1 substitutions are also rationalized in this binding mode, such as O-CF3 (7i) being more favorable than O-CH3 (7g), where one of the CF3 electronegative fluorine atoms of (7i) forms a favorable electrostatic interaction with a backbone amide H1048@HN, such that the isosteric CH3 substitution is less favorable. Finally, the model is also able to rationalize the series of substitutions at R2, namely that the isopropyl group is more favorable than the cyclopropyl as demonstrated with derivatives 7a and 15h. For derivatives 7a and 15h, the cyclopropyl group carbon atoms are more unfavorable as they are closer in distance to the polar side chain atoms of R1107 and N1108. The smaller isopropyl group lacks these unfavorable interactions and the carbon atoms bind a bit closer in distance to the aromatic carbon atoms of W886. In summary, the derivative series when modeled at Site 2 forms complementary protein-ligand interactions that are able to explain substitutions at R, R1 and R2.
3.5. Modeling a Series of Clofazimine Derivatives Binding to Other SARS-CoV-2 Targets
To increase confidence in our comparison with the derivative series SAR data, the series of derivatives were also independently docked at other binding sites of other SARS-CoV-2 target proteins. As mentioned previously, Clofazimine
2 has been reported to be a dual-targeted SARS-CoV-2 antiviral [
5], with Spike-dependent fusion inhibition activity as well as Nsp13 helicase unwinding antiviral activity [
5]. Interestingly, the same research group also measured zero activity for Clofazimine
2 in an assay for the Nsp5 Main protease (Mpro) activity [
5]. As we had previously published maps of thermodynamically favorable binding sites for Nsp5 Mpro, Nsp13 helicase and Nsp16 2’-O methyltransferase [
19], we selected to model the derivative series at the most favorable site identified on these targets for Clofazimine
2. Thus, Nsp5 Mpro and Nsp16 are “negative control” proteins, where we would expect no correlation with experimental SAR data, particularly since
2 has been reported to have no inhibition activity for Nsp5 Mpro. As expected, modeling the series of 18 derivatives at both Nsp5 Mpro and Nsp16 as “negative control” binding sites resulted in poor agreement with the experimental SAR data as well as less favorable predicted (ΔG
bind) values. Modeling the series at Nsp5 Mpro, the “untrained” predicted ΔG
bind values exhibited a negative correlation (a negative slope) with a very low correlation coefficient R
2 = 0.014 for all 18 compounds (
Figure 7A). This agrees with the observation that Clofazimine
2 has been reported to have no inhibition activity for Nsp5 Mpro [
5]. Modeling the series at Nsp16, the “untrained” predicted ΔG
bind values exhibited a negative correlation with a low correlation coefficient R
2 = 0.056 for all 18 compounds (
Figure 7A). Interestingly, the results in
Figure 7A show that in modeling the series of 18 derivatives at the most favorable site identified for Clofazimine
2 on the Nsp13 helicase (see
Supplementary Figure S1), the “untrained” predicted ΔG
bind values did exhibit some correlation (R
2 = 0.141) with the experimental EC50 values, but not as much correlation as Site 2 on the S2 segment (R2 = 0.264).
To summarize, the comparison of the docking data at other target proteins “decoy” binding sites also strengthens the conclusion that the series of Clofazimine 2 derivatives are best modeled at Site 2 on the S2 segment, rather than Site 1 on the S2 segment. When the series is modeled at all 5 binding sites, the only sites that have reasonable R2 correlation values and positive slopes are for binding at Site 2 on the S2 segment (R2 = 0.264) and at the Nsp13 helicase site (R2 = 0.141).