Biophysical Basis of Multi-Functionality in IDPs
Intrinsically disordered proteins (IDPs) lack a stable fold and any recognizable domains therein. Unlike well-folded globular proteins with a deep energy-well (global minima), they map to rugged energy landscapes with many equivalent local minima. This imparts in them structural plasiticity (flexibility) for which they are represented as conformational ensembles rather than a single structure. This in turn, helps them inherit their characteristic binding promiscuity upon encountering different patterns to aid multi-functionality [
8] without showing any overall preference in chaperone binding in vivo [
9]. Their binding promiscuity stems from a phenomenon called ‘coupled folding and binding’ [
10], where they undergo transition from disorder to different structured states upon binding to different partners. The root cause of this phenomenon lies in their non-disjoint flexible backbone trajectory that enables them to accommodate different combinations of side chain rotamers, consistent with different befitting surfaces of ordered protein partners [
5]. Protein folding and binding (in globular proteins) are analogous processes and can be bridged by concepts like that of complementarity (in geometry and electrostatics) [
11]. In case of IDPs, they keep on hovering across their rugged energy landscape in search for suitable intra and/or intermolecular interactions to stabilize them [
12,
13]. Eukaroytic IDPs stay disordered under normal physiological conditions and only fold into ordered structures when they come in contact with their ‘cellular targets’ [
14,
15,
16,
17]. It has been theorized that disordered proteins bind weakly and non-specifically to the target and aligns structurally to a befitting surface (and becomes structured) as it approaches the cognate binding site(s) [
18]. In many cases, especially in signal transduction pathways, the bindings are essentially transient (meta-stable). IDPs can also escape protein degradation by undergoing a co-translational folding mechanism involving the ribosomal surface and molecular chaperons [
19,
20]. These adaptabilities enable IDPs to engage in numerous cellular processes, contributing to their multi-functionality despite lacking a defined structure. Importantly, IDRs of proteins also serves as promising (fuzzy) drug-targets [
21,
22], wherein a whole new approach for
drug-development has lately been initiated, accounting for an acceptable representation of their conformational ensemble [
22]
as the receptor surface (in contrast to well demarcated drug-binding pockets of the folded proteins), thereby increasing the interacting cross-section for the ligands (drugs). The promiscuous binding nature of IDRs are also capitalized to make it potential drug-target, e.g., in case of castration resistant prostate cancer, the disordered N-terminal domain of androgen receptor are being targeted to overcome existing drug-resistance [
23]
. Formation of binding competent transient structures (conformational clusters) induced by molecular crowding in the close vicinity of IDPs / IDRs is another unique and idiosyncratic mechanisms to exhibit binding promiscuity and multi-functionality – as demonstrated in intrinsically disordered proteins of the Gab family (Gab1) during signal transduction [
24]. It is presumed that these contribute to the spatial organization of complex components. These observations can show us the path which we can take to decipher and understand the mechanism of the assembly of very large and distinct signal transduction protein complexes (viz., ‘signalosomes’ that are stimulus specific) in response to certain stimuli in a short period of time [
19].
Weaponry of Evolved Protein Multi-Functionality
Evolved protein multi-functionality harnesses several ammunitions (molecular evolutionary strategies) in its armoury (
Figure 1).
Following is a comparative discussion of these evolutionary tools and strategies.
1. Gene Duplication & Functional Divergence: Gene duplications create redundant gene copies, allowing one copy to retain the original function while the paralog (often varying at their oligomric states [
25]) accumulates mutations at a higher rate and is often fixed in the population by acquiring an adaptive function according to the classical model of divergence by neo-functionalization [
31]. To that end, a
ccelerated evolution in retained paralogs (e.g., Rck1/Rck2, Ptc2/Ptc3, Sim1/Sun4, Ktr5/Ktr6 [
32]
paralogs) have been observed through evolving post-translational regulation mechanisms (diversified short linear motif like sequences) [
33]. At the other end, if we consider the model of sub-functionalization, after gene duplication and divergence, the biological functions of ancestor get partitioned between two paralogs. Sub-functionalization may be of two types: qualitative and quantitative. Qualitative subfunctionalization of the molecular functions that trade-off between each other in the ancestral gene. Each paralog may then evolve towards the optimization of the retained function. Alternatively, quantitative subfunctionalization occurs when neutral evolution results in complementary loss-of-function mutations between the paralogs. In this model, both duplicates become indispensable as they together provide the ancestral functional requirements [
34,
35,
36].
2. Domain Shuffling: Reorganization of protein domains can create multi-functional proteins by combining existing functional units in new ways. It may come through horizontal gene transfer (e.g., from eukaryotes to prokaryotes) or by in-del mutations of genes, post duplication. One common way in which domain shuffling leads to novel functions is by the shuffling of exons (exon shuffling, analogous to alternative splicing at the m-RNA level) followed by in-del mutations. Usually this is established by a mapping of exons and domains (e.g., a single exon coding for a single complete domain) [
4]. Insertion of a ‘nested’ domain may also interrupt the linear sequence of a structural domain. Such insertions often map to disordered loops in the parent structure. For example, this has been found in phospholipase Cγ wherein an insert of ~300 residues (comprising of one SH3 and two SH2 domains) separates one of its two Pleckstrin Homology (PH) domains [
37]. Certain domains (e.g., the Xlink domain) of the protein
aggrecan [
26]
, the most abundant noncollagenous protein in cartilage, is also said to have been created by domain shuffling in ancestral vertebrates
.
3. Protein Moonlighting: In contrast to gene-fusion, alternative splicing or functional peptides resulting from multiple proteolysis, protein moonlighting [
28,
38] refers to multi-functionality evolved in proteins (especially enzymes) without requiring any change in their primary sequence, typically expressed via alternative sites to that of the primary active site – which often maps to a pocket for catalysis [
39,
40]. In these proteins both classic and non-classic type protein functions co-exist wherein the former refers to enzymatic activities (i.e., involving covalent bond breaking and making) while the later refers to protein – protein interactions (via alternative part of the protein’s surface). However these alternative sites are different to that of allosteric regulations often found with enzymes like phosphofructokinase, hemoglobins etc. Heat shock proteins (HSPs) are classic examples of protein moonlighting.
4. Fold-switching Proteins: Fold-switching proteins [
27], a newly emerging class of proteins undergo a distinct switching of their folds by remodelling their secondary structures upon change in environmental (physiological) conditions, for example, a change in pH [
41]. Upon fold-switching, they respond to cellular stimuli enabling them to perform important alternative regulatory (e.g., transcriptional regulation) functions of the cell (demonstrated in proteins like RfaH, KaiB etc.) [
42,
43].
5. Adaptive Evolution: Environmental pressures, such as genetic drift, natural selection etc. drive proteins to adapt, acquiring new functions that enhance an organism’s survival fitness. Adaptive mutations are largely amino-acid
substitutions that occur at the protein surface with a high degree of solvent accessibility for these exposed residues that make them most prone to mutations. Population genomics studies in model systems (Drosophila & Arabidopsis) surveyed a multitude of genomic, structural and functional descriptors and revealed that (i) the rate of adaptive substitutions are different for different functional classes (with the fastest rates of protein adaptation observed in proteins involved in translation, degradation and signalling) while (ii) intermolecular interactions (e.g., host-pathogen coevolution) is a major determinant for adaptive evolution [
44]
. Multifunctional viral proteins are classic examples of adaptive evolution [
45]
. The recent case is of course the Spike protein of the Coronavirus rapidly undergoing mutations (particularly at the solvent exposed disordered loop containing the crucial Furin like cleavage site or FLCSSpike [
46,
47]
) from SARS-CoV-2 →
omicron [
48]
, deltacron [
49]
etc. Significant patterns of co-occurrence of adaptive events have also been identified in the RNA binding domains with functional overlapping of the HC-Pro of the potyvirus (established by covariation analyses) [
45]
.
6.Intrinsically Disordered Proteins: IDPs are biological soft matters [
30] that are highly dynamic and biologically active [
50]. Unlike globular proteins, they do not have enough hydrophobic residues to trigger a hydrophobic collapse. Instead, they have high amounts of polar and charged residues [
50,
51,
52,
53] which contribute to less sequence complexity in the absence of folding [
50,
53]. This results in partial temporal order by hydrogen bonding, water mediated contacts (indirect readouts) [
54] and formation of transient interchangeable salt-bridges [
52]. IDPs do not have a characteristic deep well in their energy landscapes like globular proteins, which means they do not conform to a lone sTable 3D structure under physiological conditions. They have an affinity to undergo transition from disorder to order and back to disorder [
51,
52]. This makes them highly flexible and adaptable. Partially disordered proteins have intrinsically disordered regions that may be present in varying degrees. These regions often map to hybrid proteins that have both ordered and disordered regions [
50,
51]. A classic example of this is p53 [
55].
Recurrent salt bridges (especially, those with short-range contact orders) impart local temporal structural rigidity in IDPs. However, it is crucial to maintain a balance between the number of salt bridges that allow flexibility and prevent complete rigidity, as seen in globular proteins [
52]. Studies [
30,
46,
52] have demonstrated that salt bridges in IDPs are typically not stable (or, persistant) and tend to dissolve and reform frequently with various interchangeable counter-ionic partners. This phenomenon is referred to as ‘transient salt bridge dynamics’. This is necessary to accommodate the high occurrence of oppositely charged residues and to allow for sampling of different conformations, leading to an ensemble. These conformations are not random but revolve around a finite number of structurally degenerate conformational clusters [
30]. Phase transitions among these clusters are often triggered by switching of transient salt bridges, demonstrating critical behavior similar to a sand-pile model. The presence of these transient or flitting salt bridges may stabilize the IDP in a conformationally dependent manner, locked by befitting surfaces of its globular counterparts [
30]. This is especially relevant in the case of cell signalling, such as suppressors of cytokine signalling (SOCs) [
30], where IDRs in eukaryotic transcription factors [
53] are evolving with high sequence heterogeneity and demonstrated dynamic multi-functionality through their binding promiscuity.
IDPs, lacking a fixed structure or folding code, exist as highly dynamic
‘dancing protein clouds’ [
50] that can adopt different shapes depending on their local environment. When IDPs interact with ordered proteins, their binding contributes to at least partial folding, which depends on the binding partner. Different binding partners can induce different folds [
14,
50], making them highly adaptable. Additionally, IDPs exhibit fractals and heterogeneity, meaning that they neither converge to a steady state nor diverge to infinity, but rather stay within a defined and chaotic region.
7. Hub Proteins: Hub proteins [
29] are well-structured proteins with a (hub-like) high degree in a PPI network. They can interact with multiple partners, even those associated with very different protein networks, leading to diverse biological processes. What sets hub proteins apart is their tendency to interact with disordered partners (IDPs/IDRs) [
56], allowing them to interact with structurally diverse partners [
51]. In simpler terms, hub proteins can fold into different ordered conformations when they bind to different Molecular Recognition Features (MoRFs) of their preferentially disordered binding partners. This feature makes them highly adaptable and valuable in a wide range of biological processes.
p53: Example of a Unique Idiosyncratic Multi-Functional Hybrid Protein with Functionally Crucial IDRs
Hybrid proteins contain structured regions that are connected by disordered loops (i.e., IDRs). IDRs are directly correlated with sequence diversity, making them robust for their regulatory functions. A prime example of this is p53, a protein found in both vertebrates and invertebrates, which has a unique structure-functional mapping. Its primary function is to suppress tumors by regulating cell cycle and control. However, it also has many other related non-enzymatic biological functions, such as PPI and DNA-binding. It can form different biologically active multimers like homo-tetramers and isoform-based hetero-tetramers. Additionally, it undergoes alternative splicing and has many preferentially localized pre- and post-translational modifications that lead to various isoforms known as ‘proteoforms’. These combinations, along with the presence of multiple disorder-based protein binding sites, allow p53 to adopt meta-stable states upon interacting with many binding partners in a switch-like transient manner, characteristic of signal transducers and eukaryotic transcription factors [
53,
59]. This is possible due to the flexibility and sequence diversity offered by its IDRs. While acting as a tumor suppressor, it binds to DNA via its highly-conserved, well-structured DNA-binding domains. The flanking and interconnecting IDRs often promote these bindings to different partners transiently [
55]. These IDRs situated amidst structured domains in hybrid proteins have high amino acid substitution rates, leading to high sequence heterogeneity. The resultant expressed structural heterogeneity can be categorized into foldons, inducible foldons, semi-foldons, non-foldons, and unfoldons [
59,
66], underscoring their promiscuous binding capacity and their significance in PPI networks and signaling pathways. With over 1000 binding partners, p53’s intrinsic disorder is essential for its functionality. This intricate interplay between protein variation, intrinsic disorder, and functionality underscores the complexity of the biological machinery, with implications for understanding disease pathogenesis and the regulation of cellular processes.