We have demonstrated using both in silico projections and co-assembly studies (SPR and aggregation assays) that the N-protein and TDP-43 can,
ex vivo, form a biomolecular complex. Our SPR data suggests that the C-terminus domain of TDP-43 is the major determinant of the interaction between TDP-43 and the N protein with a K
D for the full-length TDP-43 interaction of 3.78 +/- 0.44 μM whereas that for the isolated N-terminus domain containing both RRMs is greater than 167 μM. In addition, our aggregation data suggests that the presence of RNA will further enhance this process. Given the use of full-length recombinant N protein in our assays, we cannot ascertain whether the interaction is driven by the N-CTD, the N-NTD or the full-length N protein. However, given previous observations regarding the role of RNA in mediating N protein heteropolymer formation [
18], the enhancement of the LLPS observed in the aggregations assay in the presence of short RNA oligomers would be consistent with the interaction being dependant on the N-CTD. In silico predictions suggest that this is due to the formation of a complex biomolecular condensate consisting of a N protein quadriplex interacting with the intrinsically disordered domains of two TDP-43 molecules and that this is dependant on the presence of RNA.
3.1. The Role of RNA Binding Proteins in Biomolecular Condensate Formation
It is now understood that intracellular LLPS giving rise to membraneless organelles (MLOs) is a critical physiological process that results in the formation of dynamic, highly functional biomolecular condensates. In contrast to more classical canonical macromolecular particles (e.g., RNA), biomolecular condensates are highly dynamic and readily exchange components with their surroundings, in part due to the relatively weak, non-covalent nature of interactions within the condensates [
43]. They can range from 20 nm (i.e., interchromatin granules) to 1-6 μm (i.e., P bodies) in diameter [
44]. Because of the fluidity of their composition, biomolecular condensates can display regions of differing composition and density, potentially underlying their ability to drive independent processes within the same condensate [
45].
The most prevalent biomolecular condensates are combined RNA and protein assemblies (ribonucleoprotein (RNP) granules, or RNP bodies) such as SGs and PBs where assembly/disassembly is driven by the phase separation process. Also included in this grouping are nucleoli, nuclear paraspeckles, Cajal bodies, transport granules as well as Balbiani bodies [
46]. Key physiological functions of biomolecular condensates include major aspects of RNA metabolism, including transcription, pre-mRNA processing, subcellular localization and the regulation of translation and decay [
44,
47].
The process of biomolecular condensation is divided into two broad categories of participating proteins: scaffold proteins that drive reversible condensate formation or clients which are proteins that preferentially partition into condensates [
48]. The predominant proteins found within biomolecular condensates are RNA binding proteins (RBPs), a class of
trans-acting regulatory proteins that interact with
cis-acting RNA elements with their greatest specificity being for mRNA elements [
49,
50]. RBPs interact with cognate transcripts through a small repertoire of RNA-binding domains (RBDs) that include RNA recognition motifs (RRM; typically, an average of 90 amino acids with 2 α-helices against an antiparallel β sheet), K homology (KH, a highly conserved protein domain of approximately 70 amino acids that interacts with either ssRNA or ssDNA), arginine-glycine-glycine (RGG) motifs, zinc-finger (ZnF; family of proteins averaging 30 aa with a ββα topology) and DEAD/DEAH box helicase) [
51,
52]. The combination of consecutive RRMs in an RBP increases binding affinity and specificity. In this model, we propose that the N protein is the scaffold protein which seeds a quadriplex structure in which TDP-43 is the client protein.
These observations have broader implications for a putative role for the interaction of the N protein and RBPs in the genesis of pathological biomolecular condensates in neurodegeneration. RBPs play a critical role in mRNA biogenesis, including transcription, pre-mRNA processing, localization, translation and decay [
44]. RBPs share a core set of common features that may contribute to their propensity to participate in pathological inclusion formation including their enrichment in IDRs, significantly more so than other human proteins. Within any given RBP, using prediction algorithms, the RBD is more likely to be structured (a lower predicted ID of 0.03 – 0.05) compared to the non RBD regions (a higher predicted ID of 0.30 – 0.40) and to be enriched with post-translational modifications [
53].
Cis-acting mRNA elements are not unique to an individual
trans-acting RBP and thus multiple mRNA partners for a single RBP and multiple RBPs can interact with a single RNA (through multiple
cis-acting elements).
Our study does not account for the significant impact that post-translational modifications (PTMs) of either N protein or TDP-43 will have upon the dynamics of biomolecular condensate formation. Using mass spectroscopy on in vivo cross linked RNA-protein interactors, it has been shown that RBPs are highly modified by PTMs as a class of proteins and significantly greater than that observed for the larger human protein database [
50]. Given that RBPs are also significantly more intrinsically disordered that other protein classes and thus more flexible (or less rigid), this suggests that PTMs contribute to the dynamic regulatory roles of RBPs. In addition, a single RBP may contain multiple RBDs which can assist in co-ordinating and enhancing mRNA binding. Moreover, the presence of a RBD and IDR in a RBP increases the propensity for LLPS. The characteristics are readily observed in RBPs that are commonly observed to be incorporated in pathological biomolecular condensates observed in a broad range of neurodegenerative disorders, such as the RBPs TDP-43, FUS/TLS, EWS, and RBM45 that are key to the pathological NCIs of ALS [
54,
55].
The key driving force for LLPS are interactions between aromatic and +ve charged amino acids (ie., lysine, arginine and histidine) while glycine enhances fluidity with glutamine and serine promoting hardness [
46]. Proteins that undergo LLPS tend to harbour intrinsically disordered regions (IDRs) or coiled-coil domains which increase self-interaction and aggregation at higher concentrations. IDRs are composed of long stretches of low amino acid diversity lacking in hydrophobic residues (low complexity domains; LCDs) which typically mediate cooperative folding [
56]. Although occasionally referred to as prion-like domains (PrLDs), PrDLs are most appropriately considered as a subset of IDRs with similarities to yeast prion proteins with enrichment in glutamine and/or asparagine residues. Typically, amino acids found within LCDs include polar (glutamine and serine) and aromatic (tyrosine) residues. Tyrosine residues generally combine with glycine and serine in forming [G/S]Y[G/S] motifs which have a tendency to form aggregates
in vitro, lack well-defined 3D structures and favour the formation of hydrogels [
44,
57]. IDRs also generally consist of repeats of arginine/serine (RS repeat), arginine/glycine (RGG box), arginine or lysine-rich patches (R/K patches).
While IDRs are the main drivers of LLPS, there are examples of non-prion like domains that drive LLPS. IDRs can also be the sole RBD in a RBP. Because of their lack of structure, IDRs can co-ordinate RNA binding in concert with other domains and demonstrate a range of specificities from high to nonselective and may promote protein-RNA co-folding upon interaction with target RNAs [
48,
57]. At low RNA:RBP stoichiometry, the RNAs will facilitate RBP phase separation whereas a high RNA:RBP stoichiometry leads to soluble RNP formation [
58]. Of note, proteins that are highly prone to condensate formation possess both RBDs and IDRs and can act as nucleating centres for increased condensate formation and potentially pathological liquid-solid phase separation (LLPS), thus giving rise to the hallmark nuclear and cytoplasmic inclusions of many neurodegenerative disorders. Biomolecular condensates tend to be enriched in proteins that have dual RNA binding ability and IDRs, with an increased number of RBDs correlating with a greater propensity to phase separate [
45].
In the context of pathological biomolecular condensate formation, it is noteworthy that a number of RBPs lack a classical RBD but instead interact with RNA through an IDR. There is increasing evidence that IDRs can drive RNA/RBP interactions with a higher affinity to RNA for RBPs that contain ordered RBDs and which can transition to structured domains upon RNA binding [
52]. As such, RBPs participate in LLPS through RBDs, IDRs and (as will be discussed) through an array of post-translational modifications (PTMs).
3.2. RBP Post-Translational Modifications as Modulators of Biomolecular Condensates
PTMs are important to tuning intermolecular interactions through regulating the charge state of IDPs and thus can while capable of regulating LLPS or biomolecular condensate genesis, can also promote liquid-solid phase shift LSPS and ‘molecular hardening” [
46]. PTMs can lead to the weakening or enhancement of multivalent interactions in biomolecular condensates, and, through either recruiting or excluding macromolecules to the complex, modulate the physical state of the complex. The most well characterized RBP PTMs are those related to cellular distribution and interactions, in particular with RNA through modulating RNA binding properties: phosphorylation, acetylation, arginine methylation, sumoylation, while ubiquitinylation is involved with degradation and turnover; phosphorylation and methylation being key for RNA interactions (with potential opposing interactions) [
51]. RBP PTMs influence RNA biogenesis by impacting on subcellular localization (e.g., nuclear import), stability, degradation & translation (thru regulating alternative splicing or polyadenylation), modulation of interactions with RNA and other proteins and by modulate their propensity for LLPS (an example is that methylation may preclude phosphorylation at an adjacent site by steric hindrance) [
51]. An important role for RBP PTMs is increasingly evident across a range of human diseases, including cancer & neurodegeneration; relevance to ALS of TDP-43, FUS/TLS and hnRNP-A/B [
46];
3.3. Evidence That the N-Protein Can Be Involved in Pathological Biomolecular Condensate Formation Relevant to Human Neurodegenerative Disease States
The postulate that a viral infection may set forward a cascade of events that can give rise to, accelerate, or be associated with an increased prevalence of neurodegenerative processes is not entirely novel [
59,
60,
61].
Neuropathological studies of individuals dying during the
acute manifestations of COVID-19 or shortly thereafter demonstrate prominent neuroinflammatory changes with the presence of SARS-CoV-2 RNA or the expression of either S or N proteins by immunohistological evidence being inconsistent and, when present, not clearly correlated with the severity of the neuroinflammatory pathology [
62,
63,
64,
65]. Neuropathological features of a hypoxic-ischemic injury often accompanied by vascular pathology with infarction alongside evidence of immune-mediated microvascular pathology and blood-brain barrier compromise in the acute stages of COVID-19 have been well documented [
66,
67,
68,
69]. The presence of diffuse microglial activation in the brainstem including microglial nodules, astrogliosis, and perivascular inflammation with parenchymal CD3+ and CD8+ T cells further supports the concept that the acute neurological manifestations of COVID-19 are the consequence of a prominent neuroinflammatory process in the absence of robust evidence of ongoing SARS-CoV-2 infection of the brain [
65,
70,
71,
72].
The question of whether the SARS-CoV-2 virus or N protein can
chronically be expressed in the human central nervous system and more specifically neurons as opposed to non-neuronal cell populations remains entirely uncertain. In vitro studies using the OC34 strain of the human coronavirus (HCoV-OC43) in primary murine hippocampal and cortical neuron-enriched cultures demonstrated that neurons were preferentially targeted by the virus, although at later timepoints, astrocytes were also infected [
73]. Infected neurons had largely disappeared by day 7 post infection. Human IPSC-derived motor neurons have been shown to be susceptible to SARS-CoV-2 infection and although the levels of viral replication were low, infection could be passaged to VeroE6 cells using supernatant derived from the infected IPSC-derived motor neurons [
74]. Of specific relevance to our work, N protein expression was detected in SARS-CoV-2 infected neurons. In vivo studies using intracerebral inoculums in BALB/c mice were associated with a progressive motor dysfunction with prominent neuronal degeneration. Macaques examined 5 to 6 weeks post SARS-CoV-2 inoculation and thus beyond the acute stages of infection demonstrated an ongoing neuroinflammatory response with microglial nodules, microglial activation, meningeal inflammation and T-cell infiltrates within the brain parenchyma [
75]. Active viral replication at this time point was not observed. Of importance to this discussion, α-synuclein aggregates in the ventral midbrain of 6 of the 8 macaques studied were also observed. Although this latter study did not examine the colocalization of α-synuclein aggregates with either the SARS-CoV-2 or N protein, the ability of both the S1 and N proteins (in particular the N-CTD domain) to interact with α-synuclein, the main component of Parkinson’s disease associated Lewy bodies, and accelerate pathological fibril formation, has been demonstrated by both bioinformatics approaches and in vitro [
76,
77,
78].
It remains to be resolved whether the N protein can be chronically expressed in
human brain or spinal cord and in doing so be available to either initiate or accelerate pathological biomolecular condensates with candidate RBPs. Although in PCC persistence of SARS-COV-2 RNA and both the S and N proteins has been documented, including in plasma neuron-derived exosomes for upwards of 16 months post COVID-19, there are no studies correlating this with features of neurodegeneration [
79,
80]. There remains controversy as to whether the SARS-CoV-2 virus can directly infect human neurons or whether the neurological complications of virus exposure is the cumulative impact of multiple pathological processes. Recent in vitro evidence supports that the SARS-CoV-2 virus can directly infect astrocytes which then function as the primary viral reservoir capable of mediating neuronal injury [
81].
In the context of a potential role for the N protein in inducing pathological oligomers with ALS-associated RBPs, as demonstrated here and elsewhere, there is clear ex vivo evidence that such a process can occur. It is noteworthy that the protein interactome of N protein is enriched for proteins associated with SGs, including several (e.g., G3BP1/2, Stau1 and Caprin1) that are associated with the pathophysiology of ALS [
27,
82,
83,
84,
85,
86]. Whether pathological LLPS or LSPS driven by the N protein occurs in the human nervous system and can be incriminated in the proposed increase in prevalence of neurodegenerative diseases post COVID-19 awaits further evidence from both longitudinal case control studies and detailed neuropathological analysis. Moreover, whether this process is determined by direct in vivo exposure of neurons to the SARS-CoV-2 virus or the N protein directly awaits similar studies.