2. Arginine in Protein Structure and Intrinsic Disorder
It has been reported that “arginine is an abundant (5.1%) amino acid… second most enriched amino acid in protein-protein interactions (after tryptophan)” [
3]. In fact, as per Composition Profiler portal (
http://www.cprofiler.org/help.html) [
4], where amino acid compositions of the standard datasets have been pre-computed, as means and standard deviations over 100,000 bootstrap iterations, arginine accounts for 5.40 ± 0.04%, 4.93 ± 0.06%, and 6.56 ± 0.13% of residues in SwissProt 51 [
5], PDB Select 25 [
6], and surface residues determined by the Molecular Surface Package over a sample of PDB structures of monomeric proteins, suitable for analyzing phenomena on protein surfaces, such as binding [
4], respectively. Among all amino acids, arginine was shown to have the highest mutability in the case of missense mutations linked to the genetic disorders [
7]. This is in spite of the fact that this amino acid can be encoded by 6 codons.
Structurally, lysine is closest to arginine and has the highest frequency among amino acids in replacing arginine in the primary sequences of the same protein from different organisms. However, that number is merely 48, as compared to frequency of 83 for glutamic acid being replaced by aspartic acid [
8]. At all physiologically relevant pH, arginine remains protonated. Unlike lysine, irrespective of pH, arginine serves exclusively as an H-donor [
9]. In line with these considerations, Harms
et al. (2011) conducted a systematic study of the ionizable groups buried in the hydrophobic interior of proteins using staphylococcal nuclease as a model [
10]. This analysis revealed that lysine, aspartic acid, and glutamic acid residues at 25 internal positions in this protein can have highly anomalous pK
a values, with some being shifted by as many as 5.7 pH units relative to normal pK
a values in water [
10]. On the contrary, arginine residues at the same internal positions exhibit no detectable shifts in pK
a, all being charged at pH ≤ 10 [
10]. It was also emphasized that the remarkable potential of arginine residues to remain protonated in environments otherwise incompatible with charges is determined by the capability of the guanidinium moiety of the arginine side chain to be he effectively neutralized via multiple hydrogen bonds to protein polar atoms and to site-bound water molecules [
10]. The authors argued that “this unique capacity of Arg side chains to retain their charge in dehydrated environments likely contributes toward the important functional roles of internal Arg residues in situations where a charge is needed in the interior of a protein, in a lipid bilayer, or in similarly hydrophobic environments” [
10].
Arginine, along with histidine and methionine, facilitates compactness of the protein tertiary structure by bringing together various secondary structure elements. These amino acids, thus also contribute to the enhanced stability of proteins from thermophiles [
11]. The relative hydrophobicity of arginine is lower than that of lysine, being slightly higher than that of the glutamic acid and close to the hydrophobicity of alanine [
12]. While arginine shows no preference in its positioning in α-helices, β-sheets or β-turns, among α-helices, it is more often found (along with glutamic acid, glutamine, and lysine) in α-helices near the surface of the globular proteins [
13]. In terms of the propensity to form α-helix, arginine is better than lysine and glutamic acid; whereas both alanine and leucine are better than it [
14]:
Based on the analysis of the stability and the folding and unfolding rates of 12 alanine-based α-helical peptides with a nearly identical composition and containing three pairs of positively and negatively charged residues (either Glu
−/Arg
+, Asp
−/Arg
+, or Glu
−/Lys
+), Meuzelaar
et al. (2016) reported that the Glu
−/Arg
+ salt bridge promoted α-helix content and stability at neutral pH, with the relative helicity and thermodynamic stability of the Glu
−/Arg
+ peptides following the trend (i + 4) Glu
−/Arg
+ > (i + 3) Glu
−/Arg
+ ≈ (i + 4) Arg
+/Glu
− > (i + 3) Arg
+/Glu
− [
15]. These observations were in line with previous studies [
16,
17,
18] and confirmed that a Glu
−/Arg
+-oriented salt bridge with Glu
− and Arg
+ spaced four peptide units apart is most favorable for the folded α-helical conformation [
15]. Furthermore, Meuzelaar
et al. (2016) showed that the optimized Glu
−/Arg
+-oriented salt bridge noticeably accelerated α-helix formation, and slowed down the unfolding of the α-helix, acting in these respects much better than the optimized Asp
−/Arg
+ and Glu
−/Lys
+ salt bridges [
15]. However, these authors also found that the correlation between thermodynamic and kinetic effects of salt bridges is not a general phenomenon, as although the rates of α-helix formation at neutral pH follow the order (i + 4) Glu
−/Arg
+ > (i + 4) Asp
−/Arg
+ ≫ (i + 4) Glu
−/Lys
+, the conformational stability forms a different order, (i + 4) Glu
−/Arg
+ > (i + 4) Asp
−/Arg
+ ≈ (i + 4) Glu
−/Lys
+, suggesting that the salt bridges, which do not contribute positively to thermodynamic stability may still play a kinetic role in formation or unfolding of secondary structures in proteins [
15].
One of the important paradigm shifts in our understanding of proteins has been the discovery of intrinsic disorder in proteins. It is now clear that disorder (or unstructure, as some people prefer to call it), just like structure, plays a number of crucial roles in function of proteins [
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48]. Since disordered proteins do not fold at physiological conditions, there are noticeable differences between the ordered and disordered proteins at the level of their amino acid sequences in terms of their amino acid compositions, charge, flexibility, hydrophobicity, sequence complexity, and type and rate of amino acid substitutions over evolutionary time. In fact, intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are significantly depleted in order-promoting residues Asn, Cys, His, Ile, Leu, Met, Phe, Trp, Tyr, and Val, being substantially enriched in disorder-promoting residues Ala, Arg, Gln, Glu, Gly, Lys, Pro, Ser, and Thr [
4,
19,
20,
21]. Therefore, Arg is among the disorder-promoting amino acid. The charge combined with hydrophobicity and side chain flexibility are deciding factors for this propensity [
22].
It was pointed out that the arginine-rich motifs (ARMs), being among the common RNA binding domains in proteins, can function as independent recognition units, separated from the protein in which they are found [
10,
23]. Although ARMs in different proteins are all characterized by a preponderance of arginine residues, they have low overall sequence similarity and appear to have arisen independently throughout phylogeny [
10]. It was pointed out that since ARMs are intrinsically disordered, “individual arginine residues govern binding to an RNA ligand, and the inherent flexibility of the peptide backbone may make it possible for “semi-specific” recognition of a discrete set of RNAs by a discrete set of ARM peptides and proteins” [
10].
Among the proteins with the extremely high arginine content a special place is taken by protamines, which are small, most abundant sperm nuclear proteins in many species [
24], which were isolated from the sperm by Friedrich Miescher almost 150 years ago [
25]. In fact, 48% residues in the amino acid sequences of human protamines are arginines [
24], and the true protamines can contain up to 70% arginine [
26]. In line with these observations,
Figure 1A shows that as a protein family, protamines are characterized by the exceptionally high arginine content. Based on their highly biased amino acid composition (see
Figure 1A), it is not surprising to find that almost all members of the protamine family are intrinsically disordered (see
Figure 1B,C). This is agreement with rather limited experimental data showing that the free protamine is unstructured in solution [
27].
Being synthesized in the late-stage spermatids of many animals and plants protamines are responsible for condensation the spermatid genome into a genetically inactive state. In addition, at least seven major functions were ascribed to protamines, such as tight packaging and condensation of a paternal genome into a compact and hydrodynamic nucleus required for the fast movements of spermatozoa; protecting the paternal genetic message by making it inaccessible to nucleases or mutagens; removal of transcription factors and other proteins leading to a blank paternal genetic message that devoid of epigenetic information; the imprinting of the paternal genome during spermatogenesis; acting as an epigenetic mark on some regions of the sperm genome; acting as a checkpoint during spermiogenesis, and potentially playing some specific roles in the fertilized eggs [
24,
35]. All vertebrate protamines contain a set of small “anchoring” domains containing multiple arginine or lysine amino acids and utilized in DNA binding, and also have multiple serine and threonine residues often targeted for phosphorylation [
26].
Protamine binding to DNA generates a remarkably stable complex, where a protamine wraps around the DNA helix in the major groove [
36], and were tight binding of one protamine molecule per turn of DNA helix [
37] is achieved via “he combination of hydrogen bonds and electrostatic bonds that form between the guanidinium groups of each arginine residue in the anchoring domains of the protamine and the phosphate groups in both DNA strands” [
26].
3. Cation-π Interactions Often Involve Arginine
Apart from forming ionic bonds and H-bonds, the cation-π interactions of arginine with aromatic ring containing amino acids and ligands (of proteins containing arginines) are rather important in structural biology. The cation-π interactions originates because “electron-rich π system above and below the benzene ring is partially negative and this negatively charged region of the quadrupole interacts with positively charged amino acids” [
38,
39]. Such cation-π interactions have long ago emerged as an important type of non-covalent bonds, which play a number of crucial roles in function of both structured and intrinsically disordered proteins. This includes their involvement in molecular recognition of ligands by proteins as evidenced by cation-π interaction between the ammonium ion of choline with the indole group of tryptophan in cholinesterase [
40]. Similarly, the recognition of Rev peptide by the HIV Rev-responsive element was found to have a significant contribution of cation-π interactions involving arginine residues [
41]. In proteins, phenylalanine, tyrosine and tryptophan can act as the π-systems, and lysine and arginine can be the cations, whereas histidine represents a unique system, as it can be a cation as well as a π-system. Hohlweg
et al. (2018) discussed how the cation-π interaction facilitates placement of arginine in the transmembrane helical environment in an ATPase [
42]. More important, arginine has a special role in the proton translocation by the ATPase [
42].
Early studies focused on just geometrical criterion in identifying the occurrence of the cation-π interaction in protein structures. Justin P. Gallivan and Dennis A. Dougherty [
43] discussed why that is problematic and factored in energetic considerations. Their estimates showed that >25% of tryptopan residues in PDB can form cation-π interactions, and arginine is more probable than lysine in these interactions [
43]. Kumar
et al. (2018) explained why cation-π interactions maybe more common with arginine as compared to lysine [
44]. In the case of lysine, the interaction has a much higher contribution from electrostatic forces, whereas in the case of arginine, there is about an equal contribution from electrostatic and dispersion forces. This means that the arginine-π interactions are not that much affected by the polar environments [
44]. A recent review provides an update on importance of cation-π interactions in diverse areas including healthcare [
45]. The authors pointed out that stabilization due to the cation-π interaction is higher at higher temperature. This is seen to be exploited in enzymes from thermophiles, which have more cation-π interactions than their mesophilic counterparts [
45].
Also, cation-π interactions play an important role in protein-nucleic acid interactions. For example, it is recognized that 72–87% of known protein-RNA interfaces contain arginine [
46,
47]. This enrichment of the protein-RNA interfaces in arginine is not random, as substitution of arginine with lysine often decreases the specificity of protein-RNA complexes [
48]. Furthermore, although for a long time, it was known that interaction of arginine-containing proteins and RNA can be described within the frames of an arginine-fork model [
48], where a single arginine forms four complementary contacts with nearby phosphates, yielding a two-pronged backbone readout, recently, it was established that in addition to arginine interactions with the phosphate backbone and the major-groove edge of guanine, there are also simultaneous cation-π contacts between the guanidinium group of arginine and flanking nucleobases [
49]. This is illustrated by
Figure 1 representing different modes of the two-pronged arginine interaction with RNA [
49]. Here,
Figure 2A shows arginine-fork model proposed for the explanation of the extraordinary specificity of arginine for interaction between the HIV-1 Tat protein and TAR RNA [
48].
Figure 2B underscores the possibility of the guanidinium interactions with both phosphate and nucleobases as evidenced by the results of computational modeling of arginine bound to TAR [
50]. Finally,
Figure 2C shows the related details of TAR in complex with TBP6.7, where “R47 salt-bridges to the Uri23 phosphate with simultaneous cation-π interactions between the guanidinium and nucleobases Ade22 and Uri23” [
51].
Involvement of cation-π interactions between aromatic amino acids, such as tyrosine, and basic residues, such as arginine, in liquid-liquid phase separation (LLPS) of several intrinsically disordered proteins has been reported [
52,
53,
54]. Recently, it was shown that the multiple cation-π interactions are responsible for LLPS of the complex between synaptophysin and synapsin [
39]. Here, the C-terminal region of synaptophysin (residues 219-308) contains 100 repeat regions, 9 of which start with tyrosine (Y-G-P/Q-Q-G) [
55] and therefore obviously acts as a donor of π systems. On the other hand, synapsin contains 85 positively charged amino acids and has a polybasic C-terminal intrinsically disordered region that contains 31 positively charged residues, most of whsich are arginine (21/31) [
56,
57], thereby acting as a source of cations, many of which are arginine residues. The crucial role of the multiple cation-π interactions in the synaptophysin-synapsin coacervation was supported by the observation that the synaptophysin mutant form, where all nine tyrosine residues (Y245, Y250, Y257, Y263, Y269, Y273, Y284, Y290, and Y295) were replaced with serine failed to coacervate synapsin despite the fact this mutant form retains the negative charge of synaptophysin (−8.3) [
39]. Based on these findings, the authors concluded “these results are consistent with the possibility that multivalent electrostatic π–cation interactions rather than simple negative–positive charge interactions mainly govern the coacervation between synaptophysin and synapsin in living cells” [
39].
In the same vein, the phase separation of fused in sarcoma (FUS) protein was shown to be regulated by cation-π interaction between arginine and tyrosine residues [
54]. The methylation of arginine affected this cation-π interaction and can prevent the phase separation [
45] (please see the section on posttranslational modifications below, wherein arginine methyltransferases have been discussed). Furthermore, a novel arginine-based interactions, arginine π-stacking, was recently described [
58]. This interaction mode involves arginine’s own π-cloud in the guanindino group. The tau protein aggregation leading to the formation of amyloid-like fibrils represents the basis of the pathogenesis of various taupathies including Alzhiemer’s disease [
58]. It was proposed that π-stacking by arginine residues can promote aberrant fibril interactions and also can drive the binding of other proteins to tau fibrils, thereby contributing to the formation of toxic aggregates [
58].
Although both cationic arginine and lysine residues are commonly found in proteins capable of phase separation, arginine-rich proteins are observed to undergo LLPS more readily than lysine-rich proteins [
54,
59,
60,
61,
62,
63,
64]. This observation supports the accepted notion that arginine, which is abundant in the RNA-binding proteins (RBPs), is considered as an important LLPS driver [
54,
60,
61,
62,
63,
64]. Although the difference between the lysine and arginine potentials to drive LLPS was ascribed to the fact that in addition to the ability to form cation−π interactions (with arginine forming stronger cation-π interactions with aromatic groups), arginine is capable of formingπ−π interactions [
64,
60,
65,
66,
67,
68,
69], recently it was established that the reentrant phase behavior and tunable viscoelastic properties of the dense LLPS phase are determined by the arginine hydrophobicity [
59].
An important case of the arginine-centric pathogenesis is given by a set of dipeptide repeat proteins generated as a result of a hexanucleotide repeat expansion (HRE) GGGGCC (G
4C
2) mutation in the 5′ non-coding region of the gene C9 open reading frame 72 (
C9orf72) gene, which is the most common genetic cause of amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration (FTLD) [
70,
71,
72,
73]. Despite its intronic localization and lack of an ATG start codon, this expanded microsatellite region was shown to encode a series of dipeptide repeat (DPR) proteins, that can be produced by its sense (poly(Gly-Ala), poly(Gly-Pro), and poly(Gly-Arg)) [
74,
75] as well as antisense translation (poly(Pro-Ala), poly(Pro-Arg), and poly(Pro-Gly)) [
76]. All six DPR species are produced form the expanded microsatellite region via repeat-associated non-ATG (RAN) translation [
76] and can be found in CNS tissue from the ALS and FTLD patients [
74,
75,
76]. Since the length of the pathogenic expanded repeat region of
C9orf72 can range from 45 to several thousand units [
73], the resulting DPR polypeptides can be very long. It was indicated that the highly charged arginine-rich polypeptides poly(Gly-Arg) and poly(ProArg) are the most toxic species in Drosophila, yeast, and mammalian primary neurons [
77,
78,
79,
80,
81]. Furthermore, it was shown that both sense and anti-sense RNA foci from C9orf72 expansions can be found in the same cell indicating that multiple DPRs can be translated simultaneously [
82,
83] and therefore can have the complex biological interactions originating from simultaneous expression of multiple DPR variants [
84].
In cellular model studies, the arginine-rich DPRs were found in nucleolus causing impaired rRNA synthesis and ribosome biogenesis [
85,
86]. Furthermore, the majority of the binding partners found the overlapping interactomes of the poly(Gly-Arg) and poly(ProArg) were reported as proteins with low-complexity domains (LCDs) and ribosomal proteins [
87,
88,
89]. Based on these and similar observations, Hana M. Odeh and Shorter J. Shorter concluded that one of the important mechanisms of high toxicity of the arginine-rich DPRs can be associated with the capability of these polypeptides to cause aberrations in cellular LLPS processes [
90]. Furthermore, poly(Gly-Arg) was shown to form cytoplasmic inclusions that sequester RNA and RNA-binding proteins including stress granule (SG) proteins, including the key driver of the SG assembly, Ras GTPase-activating protein-binding protein 1 (G3BP1), as well as YTH domain-containing family (YTHDF) proteins capable of binding of the N
6-methyladenosine (m6A)-modified mRNAs [
91].
Figure 3 represents this pathological mechanism, showing an importance of the interplay between poly(Gly-Arg), RNA, RNA-binding proteins (RBPs), G3BP1, YTHDF, and m6A-mRNA in the development of the aforementioned cytoplasmic biomolecular condensates [
91]. Therefore, the capability of arginine-rich DPR proteins to alter the LLPS and biogenesis of various MLOs represents an important mechanism contributing to the potential of these polypeptides to promote nucleolar toxicity, inhibit protein synthesis, impair ribosomal RNA processing and ribosome biogenesis, and interact with RNA-binding proteins [
80,
81,
86,
89,
92,
93,
94,
95,
96].
4. Importance of Arginine in Post-Translational Modifications of Proteins
Post translational modifications (PTMs) “affect localization, interaction state, stability, and turnover of proteins” [
97]. PTMs are extensively involved in regulation of biological activities of proteins and are part of the mechanisms by which signal transduction takes place [
98,
99]. For example, in moonlighting function, the PTM may be absent or different. PTMs are also involved in redox homeostasis [
100]. IDPs and IDRs are more prone to PTMs than ordered proteins and domains [
19,
97,
98,
99,
101,
102,
103,
104,
105,
106].
Arginine, being a disorder-promoting amino acid, is also targeted by a few biologically important PTMs [
107]. However, an important consequence of replacing amino group in the side chain (of lysine) with the guanidino group (in arginine) is the drastic reduction in the post-translational modifications of the side chain. Lysine is known to undergo many different kinds of acylation of the side chain amino groups in proteins. In fact, more than 100,000 sites of Lys modifications in over 10,000 proteins have been mapped [
108], and the list of the lysine-centric PTMs includes acetylation, methylation, ubiquitination, SUMOylation, NEEDylation, propionylation, butylation, crotonylation, malonylation, succinylation, glutarylation, β-hydroxybutylation, 2-hydroxyisobutyryation, lactylation, and benzoylation [
109]. This wide range of reversible lysine-based PTMs are known to regulate enzyme activities, chromatin structure, protein-protein interactions, protein stability, and cellular localization. In fact, acetylation alone regulates a large set of biological functions, such as epigenetics, homeostasis, metabolism, signal transduction, cell cycle, DNA repair, transcription, development, and aging [
110]. The biological importance of the PTMs of lysine residues is borne out by the fact that there are enzymes which are called writers, which introduce these PTMs, erasers, which reverse these PTMs, and readers, which are responsible for the downstream outputs of these PTMs [
109]. On the other hand, only two kinds of PTMs were reported for arginine, methylation and citrullination, which are catalyzed by arginine methyltransferases and arginine deiminases, respectively [
107]. Citrulination leads to the production of auto-antibodies in rheumatoid pathogenesis [
111].
Both lysine and arginine undergo methylation. Methylation of histones and other regulatory proteins involved in RNA binding, transcription, translation, chaperone activity, etc. is a key regulatory mechanism of cellular metabolism. In fact, protein methylation is a PTM involved in a vast number of processes [
112] and “1% of the functional genome encodes for the enzymes catalyzing protein methylation” [
112,
113]. Methylation of proteins involves the transfer of a methyl group (CH
3) onto either an arginine or lysine residue, with the arginine methylation being far more common on the proteomic scale than lysine methylation [
114,
115]. Both mono- and di-methylarginines are formed, the latter in either asymmetric or symmetric forms Arginine methylation is catalyzed by protein arginine methyltransferases (PRMTs) [
116], which are “ubiquitously expressed both at the cellular compartmentalization, and tissue expression levels” [
112]. This family of proteins includes “three types of that catalyse this reaction, each responsible for a different ArgMe end-product: Type I PRMTs (PRMT1, -2, -3, -4, -6 and -8) lead to asymmetric dimethylarginine (ADMA); Type II PRMTs (PRMT5 and -9) produce symmetric dimethylarginine (SDMA); and Type III PRMT (PRMT7) forms monomethyl arginine (MMA) only” [
112] (see
Figure 4). ADMA and SDMA are both reported to inhibit nitric oxide synthase [
107].
Samuel
et al. (2021) have discussed inhibitors of protein methyltransferases as they have shown promise as new therapies for certain types of brain cancers [
112].
Figure 5 outlines how arginine methylation is involved in oncogenesis. Arginine methyltransferases, apart from being inhibited by synthetic small molecular weight inhibitors, are also controlled by alternative splicing, PTMs, miRNA, and via interactions with other proteins [
112].
The importance of arginine methylation in renal transplants has attracted considerable attention [
117,
118,
119,
120], as circulatory ADMA, which serves as the endogenous nitric oxide (NO) synthase inhibitor is involved in progression of kidney disease, being associated with mortality in renal transplant recipients (RTR) and increased risk of end-stage renal disease in chronic kidney disease (CKD) populations [
121]. Furthermore, hypertension was also shown to be associated with elevated circulating ADMA concentrations [
122], whereas ADMA and SDMA are considered as cardiovascular risk factors and have emerged as predictors of cardiovascular events and death in a range of pathologies [
118,
119,
120,
123,
124]. Recently, arginine methylation has been shown to be useful as a biomarker of cardiovascular diseases [
125].
5. Crosslinks of Arginines and Their Biological Importance
Apart from the aforementioned PTMs of arginine, its participation in the crosslink formation and implications of such crosslinks have been a focus of attention for a number of years now. Crosslinking occurs both
in vivo and
in vitro [
126]. Although chemical crosslinking often links amino group of the lysine or -SH group of the cysteine, the crosslinks involving arginine have also been reported to form
in vitro [
127]. Collier
et al. (2016) described formation of four types of lysine-arginine crosslinks in collagen present in bones, tendons, ligaments, and dermis as a part of the non-enzymatic glycation [
128]. These advanced glycation end products (AGEs) are formed in diabetes and other age related diseases [
128]. Because of type I collagen has long half-life, which can be up to 200 years in tendon [
129], this protein is particularly prone to AGE cross-linking in a number of different tissues [
128]. Depending on the reactive dicarbonyl donors, such as glucose, deoxyglucosone, methylglyoxal, and glyoxal, four lysine-arginine AGE cross-links, glucosepane, 3-deoxyglucosone-derived imidazolium crosslink (DOGDIC), methylglyoxal-derived imidazolium crosslink (MODIC), and glyoxal-derived imidazolium crosslink (GODIC), are commonly formed [
128]. It was also reported that these AGE cross-links can occur in human lens protein, where they can be found at concentrations of 132.3–241.7 pmol/mg of protein for glucosepane, 1.3–8.0 pmol/mg of protein for DOGDIC, 40.7–97.2 pmol/mg of protein for MODIC and concentrations below the quantifiable level of the instrument for GODIC [
130].
Photosensitized crosslinking of cornea has many clinical applications that include stromal stiffening for treatment of ectatic diseases, such as keratoconus, photobonding of LASIK flaps to the corneal stroma, and sealing wounds and lacerations [
131,
132,
133,
134,
135]. While riboflavin had been used earlier for the photosensitized crosslinking, its replacement by rose bengal has several advantages [
136], including the important capability of rose bengal molecules to remain in a ∼100 µm layer of stroma near the epithelial surface rather than diffusing throughout as occurs for riboflavin [
137,
138]. The protocols involving presence of oxygen or its absence have been described [
136]. Wertheimer
et al. (2020) reported that arginine acting as an electron donor promoted the corneal photosensitized crosslinking by rose bengal even in the absence of oxygen [
136].
A collagen-chitosan 3D-hybrid scaffold was reported to be “cross-linked by arginine” to improve stability of this scaffold for tissue engineering [
139]. No explanation of how arginine acts as a cross-linker was given; even more intriguing was the statement that crosslinking could also be carried out in the absence of arginine. The protocol for preparation of the scaffold involved freeze-drying. It is likely that arginine helped the scaffold structure during freezing or drying stage [
139].
An unusual catalytic reaction was recently described for one of the radical S-adenosylmethionine (RaS) enzymes (a diverse protein superfamily capable of catalyzing chemically difficult transformations), which led to the formation of a crosslink between arginine and tyrosine during the biosynthesis of ribosomally synthesized and post-translationally modified peptides (RiPPs that represent natural products with diverse structures and functions) [
140]. This arginine-tyrosine crosslinking resulted in the “installation of a macrocyclic carbon–carbon bond that links the unactivated δ-carbon of an arginine side chain to the
ortho-position of a tyrosine-phenol” thereby generating a unique macrocyclization motif [
140].
DNA-binding proteins frequently have arginine residues that are used for interaction with DNA [
141]. This observation was used to construct reactive DNA probes for the proximity labeling of DNA-binding [
142]. For example, 1,3-diketone-modified nucleotides and DNA capable of cross-linking with arginine-containing peptides and protein were synthesized as a probe to identify binding regions in such proteins including histones [
142]. It should be mentioned here that the chemical modification of guanidino group has been mostly restricted to the reaction with diketones, such as phenylglyoxal, 1,2-cyclohexanedione and a trimer of 2,3-butanedione [
143,
144,
145].
Jones
et al. (2019) have described the synthesis of two crosslinkers, one homobifunctional (based upon aromatic glyoxal moieties) for forming arginine-arginine crosslinks; and another heterobifunctioal (based upon diketone and NHS moieties) for forming lysine-arginine crosslinks. In these designs, attention was paid to typical distances found in protein-protein interactions involving lysine and arginine groups [
3]. This was factored in deciding the span between the two reactive ends of the cross-linkers [
126].
Because of arginine was shown to act as a very useful bioactive component due to its excellent biosafety, antimicrobial properties, and therapeutic effects on wound healing, and because it can also be used for treatment of specific pathological conditions, such as diabetes and trauma/hemorrhagic shock, there are multiple forms of arginine-based therapy [
146]. The usefulness of arginine for wound healing is known for decades, and there are multiple arginine-based systems for the application in wound healing that can be classified as direct supplemental approaches of free arginine and indirect approaches based on the arginine derivatives, where modified arginine can be released after biodegradation, e.g., from wound-healing dressings [
146]. Among various means for arginine incorporation into the wound-dressing material are electrostatic attachment to high-molecular-weight hyaluronic acid (HA, which is one of the most common extracellular matrix biomacromolecules) [
147], or to the lignin nanofibrils [
148], or to silicon and inositol to form arginine silicate inositol (ASI) complex [
149]. Arginine can also be covalently attached to scaffold molecular chains via imine-type bonds or conjugate to polymeric skeletons [
146], with the characteristic examples given by the composite hydrogels of poly(vinyl alcohol) (PVA) and oxidized polysaccharides [
150], arginine-crafted chitosan [
151], and arginine-based poly(ester amides) (Arg-PEAs), which represent a family of biodegradable and biocompatible synthetic polymers consisting of three nontoxic building blocks, L-arginine, diols, and dicarboxylic acids [
152].
6. Arginine in the Active Sites of Enzymes
Given the positive charge on its side chain, which enable both electrostatic and cation-π interactions, it is not surprising that arginine plays an important role in the active site of enzymes, quite often directly participating in the binding of a substrate. Cotton
et al. (1977) had studied complex formation between gunidine hydrochloride and p-nitrophenyl phosphate dianion and suggested that gunidino side chain of arginine orients the phosphate group correctly during enzymatic hydrolysis of phosphate compounds [
153]. The involvement of arginine in binding of iodine in the active site of horse radish peroxidase was established by chemical modification studies with phenylglyoxal [
154]. Even much earlier, chemical modification with diones was used for identifying arginine as an important active site residue of the avian liver phosphoenolpyruvate carboxykinase, an enzyme which is a part of the central metabolic pathway associated with the gluconeogenesis [
155]. The kinetic data indicated involvement of the active site arginine in the carbon dioxide binding and activation [
155]. Another example with important active site arginine is given by the argininosuccinate synthase, a key enzyme in urea synthesis, deficiency of which is associated with hyperammonemia [
156]. The participation of this arginine in the ATP binding during the catalysis was also identified by chemical modification studies [
157].
It is important to emphasize here that the role of arginine in the enzyme active sites is not limited to binding of the substrates/ligands, as this residue is also known to have a catalytic role in several cases. For example, arginine has been shown to act as a general acid catalyst in DNA cleavage by a site specific serine recombinase [
158]. In xanthine dehydrogenase also, arginine actively participates in both binding and catalytic steps [
159]. Ypt/Rab proteins are monomeric GTPases and were found to have 5 invariant arginine residues (dubbed as arginine finger) in their catalytic domain, with substitution of only one of them rendering the GAPs almost completely inactive [
160].
Sulphotranferases have broad specificity towards -OH containing substrates. Arginine residues were found to be critical for binding of the coenzyme in sulphation of simple phenols by human phenol sulphotransferase, an important detoxification enzyme with broad substrate specificity and lack of endogenous substrates [
161]. Recently, a volume on the sulfurtransferases represented studies focused on the use of the site directed mutagenesis to establish the role of arginine residue in the catalysis by thiosulphate sulfurtransferases (TST) containing R-K-G-V-T-A motif [
162]. TSTs, which are also known as rhodanases, find wide applications in medicine and biotechnology. TST also produces hydrogen sulfide which has emerged as an important player in both intra- and intercellular signaling [
162].
Human type I D-myo-Inositol 1,4,5-trisphosphate 5-phosphatase, an enzyme involved in generating calcium signal, which in turn regulates several cellular process, has two reactive arginine residues in its active site that play crucial role in the enzymatic activity of this protein [
163]. These arginine residues are part of the 10 amino acid-long sequence M-N-T-R-C-P-A-W-C-D-R-I-L, which is conserved and is involved in substrate recognition [
163]. Analogously, the triad of arginine residues in the anionic binding pocket of the ArsC arsenate reductase of plasmid R773 that catalyzes reduction of arsenate in
Escherichia coli, was shown to be in arsenate binding and transition-state stabilization [
164]. In the same vein, in sulfite oxidising enzymes, the active site arginine has been shown to be critical for electron transfer from Mo to heme redox center [
165]. This arginine residue is conserved in these Mo-containing enzymes and is associated with a clinical mutation leading to sulfite oxidase deficiency [
165].
An interesting role of arginine has been observed in the peroxisomal enzyme human D-amino acid oxidase (hDAAO) [
166]. The enzyme is involved in the degradation of D-serine, which is the main co-agonist of N-methyl D-aspartate receptors in brain and hence is involved in brain function and some of its diseases. An arginine residue present on the monomer-monomer interface and located 20 Å from the assumed second ligand-binding site was shown to be responsible for FAD binding. The mutation of this arginine resulted in increasing innate mistargetting of the enzyme to the nucleus [
166]. Heme nitrite reductase produces NO and ammonium ion and is a key enzyme of nitrogen cycle. Recently, an arginine residue was reported to assist substrate binding and donate a proton during the catalysis [
167].