2.1. Intrinsic Disorder Status of Members of Human Synuclein Family
The amino acid sequences of all the synucleins analyzed in this study are listed in the
Supplementary Table S1.
Figure 1,
Figure 2 and
Figure 3 represents the results of the intrinsic disorder-centric analysis of human α-, β-, and γ-synucleins, which consist of 140, 134, and 127 amino acids, respectively. It was emphasized that among of the characteristic features of human synucleins is the presence of acidic stretches within their C-terminal regions, whereas within their 87 N-terminal residues, they possess a degenerative KTKEGV repeat that defines the hydrophobic variability of their sequences with a periodicity of 11 amino acids, which is characteristic of the amphipathic helices [
128]. Although human α- and β-synucleins share 78% identical residues including conserved C-termini containing three identically placed tyrosine residues, β-synuclein lacks 11 residues (residues 73–83) within its middle region [
19]. There is 60% sequence similarity between human α- and γ-synucleins, with γ-synuclein lacking the tyrosine-rich C-terminal signature of α- and β-synucleins [
19].
Analysis of these figures provide compelling evidence of the highly disordered nature of all three members of human synuclein family. Originally, interest of the researchers to human α-synuclein was promoted by finding a relation of aggregation of this protein to the pathogenesis of Parkinson’s disease (PD), which is recognized as the most common aging-related movement disorder and the second most common neurodegenerative disease after Alzheimer’s disease (AD). It is estimated that ~1.5 million Americans are affected by PD. Sporadic (or idiopathic) forms of this disease account for about 95% of the PD patients [
129,
130]. The probability of sporadic PD development increases with age, with only a small percentage of patients diagnosed before the age of 50 [
131]. The prevalence of PD is much greater among those who are at least 65 years old [
132]. Approximately 1% of the population at 65–70 years of age is affected by PD, whereas the number of PD patients increases to 4–5% in 85-year-olds [
133]. In addition to the sporadic form, multiple familial forms of the PD are associated with the mutations in a number of genes. These hereditary forms account for ~4% of PD patients, who develop early-onset disease before the age of 50 [
134,
135]. The pathological hallmarks of PD is the id the presence of the cytosolic filamentous inclusions known as Lewy bodies (LBs) and Lewy neurites (LNs) in surviving dopaminergic neurons within the
substantia nigra [
8,
9]. These inclusions that contain aggregated forms of α-synuclein, can also be found in other parts of brain [
136] and are associated with the pathogenesis of various synucleinopathies [
25,
26,
27,
28,
29,
30,
31,
32,
33] characterized by the presence of the common pathologic inclusions composed of aggregated α-synuclein, which are deposited in selectively vulnerable neurons and glia [
17,
18,
23,
38]. Finding α-synuclein in LBs and LNs [
32,
37], as well as the existence of the specific missense mutations in the SNCA gene, corresponding to the A30P, E46K, and A53T substitutions in the α-synuclein protein in autosomal dominant early-onset forms of PD [
137,
138,
139], and a link of other early-onset PD forms to the hyper-expression of wild type α-synuclein due to the gene duplication/triplication [
140,
141,
142] strongly implicated α-synuclein in the PD pathogenesis.
The α-synuclein sequence is assumed to contain three functional regions: the N-terminal region (residues 1−60) contains four 11-amino acid imperfect repeats with a conserved motif (KTKEGV, residues 10-15, 21-26, 32-37, and 43-48); the central region (residues 61−95) that contains three additional repeats (residues 58-63, 69-74, and 80-85) and is known as a highly amyloidogenic non-Aβ component of AD plagues (NAC) region that was found in amyloid plaques associated with AD [
118]; and the highly charged C-terminal region (residues 96−140) which is involved in protein-protein interactions. Note that the N-terminal and central regions comprise a lipid-binding domain. Detailed experimental analysis of purified α-synuclein
in vitro provided strong evidence of the highly disordered nature of this protein [
3,
4,
6,
143]. However, it was also indicated that structure of α-synuclein does not represent a random coil, but is characterized by the presence of transient long-range contacts within the protein [
9,
144,
145,
146].
In agreement with experimental data,
Figure 1A,B show that human α-synuclein is predicted to be highly disordered by most computational tools utilized in this study. Furthermore,
Figure 1B shows that the C-terminal region of this protein contains two molecular recognition features (MoRFs, which are disordered regions that can undergo binding-induced folding at interaction with specific partners) (residues 87-94 and 111-140), and the entire protein is heavily decorated by multiple PTMs, clearly indicating the crucial functional role of its intrinsic disorder.
Figure 1C shows that human α-synuclein is characterized by a high liquid-liquid phase separation (LLPS) potential. Its probability of spontaneous liquid-liquid phase separation (p
LLPS) value of 0.6249 exceeds the threshold of 0.6 indicating that the α-synuclein can act as a droplet-driver capable of undergoing LLPS spontaneously [
147]. Furthermore, the C-terminal region of this protein contains a long droplet-promoting region (DPR, residues 101-140), which also includes an aggregation hot-spot (residues 115-123), which is defined as a region that is capable of promoting the conversion of the liquid-like condensed state into a solid-like amyloid state [
148]. These predicted LLPS potential of human α-synuclein is in line with the experimentally demonstrated capability of this protein to undergo LLPS [
149,
150,
151,
152,
153].
Curiously,
Figure 1D shows that human α-synuclein is expected to contain multiple regions with context-dependent interactions (residues 3-13, 15-75, 77-92, 94-105, and 115-123), i.e., regions exhibiting ordered or disordered binding modes depending on the cellular context (environment, sub-cellular localization, partners, and PTMs). These regions are capable to be engaged in the multiplicity of binding modes in the cellular context-dependent manner [
154]. Data shown in
Figure 1B,
D indicate that human α-synuclein is predisposed to be a promiscuous binder, as its almost entire sequence can act as potential binding platform. In line with this conjecture,
Figure 1E shows that α-synuclein can be engaged in interaction with 356 proteins forming a very dense protein-protein interaction network, 357 members of which are connected by 7,316 interactions. This network is characterized by the average node degree of 41 and average local clustering coefficient of 0.639. Since the expected number of edges in a random set of proteins of the same size and degree distribution drawn from the genome is 2,946, this α-synuclein-centric network has significantly more interactions than what would be expected (its PPI enrichment p-value is < 1.0e-16). Five most enriched biological processes, molecular functions and cellular components (as per Gene Ontology annotations) of the members of this network, as well as most enriched local STRING network clusters, and KEGG pathways are listed in
Table 1.
Figure 1F demonstrated 3D structure of human α-synuclein modeled by AlphaFold. According to this model, α-synuclein does not have a compact core, with the only structured element predicted in this protein being a long α-helix spanning residues 1-91. This is a rather unrealistic structure, as long α-helices typically cannot exist in isolation, as they need to be stabilized by interactions either with the compact protein core or via binding to specific partners, such as other proteins, nucleic acids or membranes. Therefore, it is likely that in this case, AlphaFold predicts a 3D structure of a bound form of α-synuclein. In fact, comprehensive experimental analysis of purified α-synuclein
in vitro using a multitude of techniques sensitive to different levels of proteins structural organization revealed that this protein is highly disordered [
3,
4,
6,
143]. Although transient long-range interactions were observed within this protein [
9,
144,
145,
146] solution NMR analysis did not show the presence any stable structural elements in the unbound form of this protein. However, this protein has been shown to adopt secondary structure of mostly helical nature upon interaction with the negatively charged small, unilamellar vesicles (SUVs) or detergent micelle surfaces [
3,
5,
155,
156], and α-helical structure was induced in this protein in the presence of lipids [
157] and organic solvents [
158]. Furthermore, binding of α-synuclein to a micelle of the detergent sodium lauroyl sarcosinate (SLAS) was shown to be accompanied by the disorder-to-order transition resulting in the formation of two antiparallel micelle-bound α-helices (residues 1-31 and 41-91) [
159]. In agreement with this NMR-EPR based study, solution NMR analysis of the micelle-bound form of α-synuclein revealed the presence of the two anti-parallel curved α-helices (residues 3-37 and 45-92) connected via an extended but well-ordered linker [
160].
Similar to α-synuclein, human β-synuclein is predicted to contain high levels of intrinsic disorder (see
Figure 2). The major difference between these two proteins is the lack of 11 residues (residues 73–83) within the middle region of β-synuclein [
19]. As a result, the overall percent disordered residues (as per PONDR
® VSL2 analysis) decreases from 90.71% in α-synuclein to 87.31% in β-synuclein. On the contrary, the average prediction score increased from 0.7199 in α-synuclein to 0.7342 in β-synuclein (see
Figure 2A).
Figure 2B shows that human β-synuclein, being predicted as mostly disordered by all the tools included into the D
2P
2-based analysis, is expected to have three MoRFs (residues 1-9, 65-89, and 100-134), indicating that intrinsic disorder plays a crucial role in its interactability. Furthermore, function of β-synuclein can be modulated by various PTMs. At the same time, this protein has lost the capability to undergo spontaneous LLPS (its p
LLPS of 0.5427 is below the threshold of 0.6) together with the aggregation hot spot. However, β-synuclein still can act as a droplet client, since it has a long DPR (residues 95-134) at its C-terminal tail (see
Figure 2C). As per
Figure 2D, human β-synuclein contains four regions with context-dependent interactions (residues 8-19, 21-58, 78-87, and 92-98). Therefore, this protein is also expected to act as a highly promiscuous binder. The idea is supported by
Figure 2E showing the β-synuclein-centered PPI network generated by STRING, which contains 85 nodes connected by 715 edges. The average node degree of this network is 16.8, and its averaged local clustering coefficient is 0.682. Furthermore, this network has significantly more interactions than expected (715 vs. 143), being characterized by the PPI enrichment p-value of < 1.0e-16. Five most enriched biological processes, molecular functions, and cellular components (as per Gene Ontology annotations) of the members of this network, as well as most enriched local STRING network clusters, and KEGG pathways are listed in
Table 1. Among functional differences of the members of the α-synuclein- and β-synuclein-centered PPI networks is a remarkable change in the KEGG pathways from exclusively disease-oriented pathways in α-synuclein-centered network (PD, ALS, AD, Prion disease, and Huntington’s disease) to the Synaptic vesicle cycle, PD, Nicotine addiction, Serotonergic synapse, and Insulin secretion pathways in the β-synuclein-centered PPI network.
Similar to α-synuclein, human β-synuclein was shown experimentally to be extensively disordered [
6,
8,
9,
72], with β-synuclein being somewhat more disordered than α-synuclein [
6]. These experimental observations are supported by the results of our computational analysis.
Figure 2F represents the AlphaFold-generated 3D structural model of human β-synuclein, showing the presence of a single, long, horse-shoe-like α-helix (residues 2-80). Solution NMR analysis of this protein in the unbound form revealed that its residual structure was shown to noticeably differ from that of α-synuclein, with the helical propensity of β-synuclein being clearly reduced between residues 66 and 83 [
9]. This difference in the residual structure of the unbound state was shown to propagate to its micelle-bound form, as the NMR analysis revealed that although the lipid-binding domain of β-synuclein, which is missing 11 residues, remains predominantly helical in the micelle-bound form and preserves the break around position 42, it is characterized by a dramatic decrease in the stability of the helical structure within the 65-83 region [
8].
Figure 3 shows that human γ-synuclein (which is different from other members of the human synuclein family by the absence of the tyrosine-rich C-terminal signature [
19]) is also predicted as highly disordered protein. In fact, it seems that it is the most disordered member of the family, since its overall percent disordered residues (as per PONDR
® VSL2 analysis) is 100% and its average prediction score is 0.8328 (see
Figure 3A).
Figure 3B represents the functional disorder profile of human γ-synuclein generated by the D
2P
2 platform and also shows the high prevalence of disorder in this protein, which is also expected to have three MoRFs (residues 1-10, 68-77, and 87-97) and several PTMs. As per FuzDrop analysis (see
Figure 3C), γ-synuclein is not expected to undergo spontaneous LLPS but can serve as a droplet client, and also contains an aggregation hot-spot (residues 94-106). These features make this protein closer to α-synuclein than to β-synuclein. This hypothesis is supported by experimental analyses that revealed the closer structural similarity of these two proteins [
6,
9,
161]. The decreased aggregation potential of γ-synuclein in comparison with that of α-synuclein was attributed to an increased α-helical propensity in the amyloid-forming region that is critical for α-synuclein fibrillation, suggesting that increased structural stability in this region may protect against γ-synuclein aggregation [
161].
Figure 3D shows the presence of four regions with context-dependent interactions (residues 4-66, 70-75, 83-89, and 94-106). Two of these regions overlap with MoRFs.
Figure 3E represents the γ-synuclein centered PPI network containing 32 nodes and 117 edges. Although this network is the smallest one among the synuclein family members, it is still has significantly more interactions than expected (117 vs. 46). It is characterized by the PPI enrichment p-value of < 1.0e-16, average node degree of 7.31, and high average local clustering coefficient of 0.752. Five most enriched biological processes, molecular functions, and cellular components (as per Gene Ontology annotations) of the members of this network, as well as most enriched local STRING network clusters, and KEGG pathways are listed in
Table 1. Finally,
Figure 3F represents 3D model of human γ-synuclein generated by AlphaFold. In line with all other data discussed in this section, this structural model is very similar to that generated for α-synuclein, where a single long α-helix (residues 2-91) is observed.
2.4. Functional Disorder Analysis of Human Proteins Engaged in Interaction with Members of Synuclein Family
At the next stage, we checked the prevalence of intrinsic disorder in human proteins involved in interactions with α-, β-, and γ-synucleins. PPI networks generated for individual proteins are shown in
Figures 1E,
2E, and
3E, whereas a global PPI network centered at all three synucleins is shown in
Figure 9. This network was generated using the confidence level of 0.45 as a minimum required interaction score. The network includes 469 proteins involved in 10,731 interactions, which significantly exceed the 4,889 interactions expected to happen in a random set of proteins of the same size and degree distribution drawn from the genome. The average node degree of this network is 45.8, whereas its average local clustering coefficient is 0.585. Five most enriched biological processes, molecular functions and cellular components (as per Gene Ontology annotations) of the members of this network, as well as most enriched local STRING network clusters, and KEGG pathways are listed in
Table 1.
Next, we compared the levels of intrinsic disorder in all these interactomes with the disorder status of all proteins in human brain. Results of this analysis are shown in
Figure 10, which clearly indicates that all analyzed protein sets contain noticeable levels of intrinsic disorder.
Figure 10A summarizes the results of this analysis in a form of the PONDR
® VSL2 score
vs. PONDR
® VSL2 (%) plot. Based on the results of these analyses, proteins can be classified using the percent of predicted intrinsically disorder residues (PPIDR; i.e., percent of residues with the disorder score of 0.5 or higher). Here, a PPIDR value of less than 10% is taken to correspond to a highly ordered protein, PPIDR between 10% and 30% is ascribed to moderately disordered protein, and PPIDR greater than 30% corresponds to a highly disordered protein [
174,
175]. In addition to PPIDR, average disorder score (ADS) was calculated for each query protein as a protein length-normalized sum of all the per-residue disorder scores. The resulting MDS values can be used for protein classification as highly ordered (MDS < 0.15), moderately disordered of flexible (MDS between 0.15 and 0.5), and highly disordered (MDS ≥ 0.5).
Figure 10B represents the results of global disorder analysis in the form of the ΔCH-ΔCDF plot that can be used for further classification of proteins as mostly ordered, molten globule-like or hybrid, or highly disordered based on their positions within the resulting CH-CSD phase space [
109,
176,
177,
178]. The results of the corresponding classification are summarized in
Table 2. This analysis revealed that although proteins in the joint α-β-γ synuclein interactome and especially proteins interacting with human α-synuclein are somewhat less disordered than protein in the human brain proteome, interactors of β- and especially γ-synuclein are noticeably more disordered. In fact, as per PONDR
® VSL2 analysis, all proteins interacting with β- and γ-synucleins are moderately or highly disordered.
Figure 11A and Table 2 provide further illustration for this observation and also shows that, on average, most of proteins in these various sets are classified as moderately or highly disordered, emphasizing the potential importance of intrinsic disorder for functionality of these proteins.
Next, we took a look at the interactability of different proteins from the joint α-β-γ synuclein interactome and compared the corresponding node degree of these proteins with their disorder status. Results of this analysis are shown in
Figure 11B. In this network, almost half of proteins (207 of 467, 44.3%) are involved in more than 47 interactors each, indicating that these proteins can be considered as hubs. These hub proteins are characterized by the mean node degree of 76±41 and mean PPID of 37.8±22.8%. Curiously, 60 proteins with the least number of interactors (with 10 or less partners each) were characterized by the mean node degree of 6.0±2.7 and the mean PPID of 51.4±25.9%. On the hand, 60 most connected proteins were characterized by the mean node degree of 123±43 and the mean PPID of 43.1±21.4%. Curiously, 60 most disordered proteins in this dataset had the mean node degree of 44.2±60.6 and the mean PPID of 87.5±8.4%, whereas 60 most ordered proteins of this set were characterized by the mean node degree of 43.7±32.0 and the mean PPID of 11.1±11.6%. These data taken together indicated that generally proteins with lower disorder levels are expected to be engaged in a bit more of interactions. However, the situation is changed if one compares 20 most ordered (PPID of 7.5±1.8%) with 20 most disordered proteins (PPID of 96.6±3.9%), as their mean node degrees were of 40.0±19.5 and 57.0±92.8.
We also looked for a correlation between the overall disorder status, interactability, and LLPS predisposition of human proteins in the joint α-β-γ synuclein interactome. Results of this analysis are summarized in
Figure 12 showing the corresponding outputs in the form of the 3D plot. This analysis revealed that proteins predicted by FuzDrop as droplet drivers (i.e., possessing the pLLPS ≥ 0.6) are on average more disordered than proteins which are not capable of spontaneous liquid-liquid phase separation. In fact, 130 proteins with the pLLPS ≥ 0.6 were characterized by the mean PPIDR of 66.3±19.5%, whereas remaining 337 proteins from the joint human α-β-γ synuclein interactome were characterized by the mean PPIDR of 31.4±18.4%. On the other hand, LLPS drivers and non-drives did not show noticeable difference in their within network interactivity: within the joint human α-β-γ synuclein interactome, their corresponding mean node degrees were 40.4±49.3% (drivers) and 48.0±35.6% (non-drivers), respectively. Comparative analysis of the 130 most disordered proteins revealed that they are characterized by the mean PPIDR of 74.5±14.0%, have mean node degree of 41.7±48.2 and mean p
LLPS of 0.753±0.295. Remaining 337 proteins are characterized by the mean PPIDR of 28.3±12.4%, have mean node degree of 47.4±35.2 and mean p
LLPS of 0.311±0.229. Comparative analysis of the 130 most connected proteins with the mean node degree of 90.8±45.6 revealed that they are characterized by the mean PPIDR of 38.4±22.2% and the mean p
LLPS of 0.379±0.273. Remaining less interactive human proteins in the joint α-β-γ synuclein interactome have the mean node degree of 28.6±16.3, the mean PPIDR of 42.2±25.2%, and the mean p
LLPS of 0.456±0.331.