Preprint
Article

The Continuous Adaptive Challenge Played by Arboviruses: An In Silico Approach to Define Relevant Molecular Interactions with the Host

Altmetrics

Downloads

185

Views

63

Comments

0

Submitted:

22 April 2023

Posted:

24 April 2023

You are already at the latest version

Alerts
Abstract
Climate change and globalization have raised the risk of vector-borne disease (VBD) introduction and spread in various European nations in recent years. In Italy, viruses carried by tropical vectors have been shown to cause viral encephalitis, one of the symptoms of arbovirosis, a spectrum of viral disorders spread by arthropods such as mosquitoes and ticks. Arbovirosis are currently causing alarm and attention, and the World Health Organization (WHO) has released recommendations to adopt essential measures, particularly during the hot season, to restrict the spreading of the infectious agents among breeding stocks. In this scenario, rapid analysis systems are required, because they can quickly provide information on potential virus-host interactions, the evolution of the infection, and the onset of disabling clinical symptoms, or serious illnesses. Such systems include bioinformatics approaches integrated with molecular evaluation. Viruses have co-evolved different strategies to transcribe their own genetic material, by changing the host's transcriptional machinery, even in short periods of time. The introduction of genetic alterations, particularly in RNA viruses, results in a continuous adaptive fight against the host's immune system. We suggest an in silico pipeline method to unravel viral sequences that may interact with host RNA binding proteins (RBPs), which play important roles in RNA metabolism and its several related biological processes. Indeed, viral RNA sequences, able to bind host RBPs may compete with cellular RNAs, altering important metabolic processes. Our findings suggest that the proposed in silico approach, could be a useful and promising tool to investigate the complex and multiform clinical manifestations of viral encephalitis, and possibly identify altered metabolic pathways as targets of pharmacological treatments and innovative therapeutic protocols.
Keywords: 
Subject: Public Health and Healthcare  -   Health Policy and Services

1. Introduction

Despite significant advances, infectious diseases remain one of the leading causes of illness, disability, and death on a worldwide scale [1]. A multitude of variables impact the emergence and re-emergence of contagious diseases, including pathogen genetics, environmental changes, and the ever-increasing frequency of human and animal mobility, which increases the chance of interaction between viral hosts and possibly between viral host species. Among the zoonotic diseases that ravaged the global people in recent decades there are: Acquired Immuno Deficiency Syndrome (AIDS), Ebola virus (EBOV), Severe Acute Respiratory Syndrome (SARS), Middle East Respiratory Syndrome (MERS), avian influenza and the recently discovered COVID-19 [2]. This scenario includes zoonosis caused by arboviruses, which are transmitted by ticks and mosquitoes, frequently causing non-specific fever, and in many cases encephalitis and hemorrhages [3].
Arboviruses are RNA viruses that are classified according to structural and ecological criteria into the three most prevalent viral families (Togaviridae, Flaviviridae, and Bunyaviridae) [4].
The Togaviridae family is split into two genera: alphaviruses (including the Chikungunya virus) and rubiviruses (including the Rubella virus, which causes rubella) [5]. In particular, the Alphavirus genus is represented by about thirty species that are spread around the world and can infect several vertebrate species, including humans [5]. The alphavirus genome is composed of a single positive strand of RNA (+ssRNA virus) that is approximately 11-12 kb in length and is organized into two main regions: a non-structural domain located towards the 5 end, which codes for non-structural proteins and a structural domain near the 3 end that codes for three structural proteins [6]. Non-structural proteins are converted into one or two polyproteins before being spliced to generate four proteins known as nsP1, nsP2, nsP3, and nsP4. The structural domain instead is translated, starting from a sub-genomic mRNA (26S mRNA with approximately 4100 nucleotides), into a single polyprotein. This polyprotein is processed to produce the envelope proteins (E1 and PE2, precursors of E2), a capsid protein (protein C), and two small polypeptides called E3 and 6K. The protein C is immediately assembled with genomic RNA in the nucleocapsid, and PE2 and E1 are transported with 6K to the plasma membrane; PE2 is then processed in E2 protein [6,7]. The alphavirus infection has a 3–12-day incubation period and results in flu-like symptoms such as high fever, chills, headache, nausea, vomiting, and, most importantly, arthralgias, which severely restrict mobility. The fever lasts about 4 days but it can follow a second stage of the disease, characterized by diffuse itchy maculopapular rash and fever relapse. More significant consequences are uncommon and are either hemorrhagic (within 3-5 days) or neurological, mainly in children [8].
Flavivirus is a virus with +ssRNA that belongs to the Flaviviridae family. Most of these viruses are classified as arthropod-borne viruses because they are transmitted through the bite of an infected hematophagous arthropod, primarily mosquitoes or ticks [9]. More than 70 species have been identified as a result of phylogenetic analyses aimed at defining the connection between various flaviviruses, classified into three groupings (clusters), and 14 serocomplexes distributed over distinct clades. The following clusters are defined through the identification of the involved vector: mosquito-borne flavivirus (MBF), tick-borne flavivirus (TBF), and no-known-vector (NKV) viruses, for which no vector has been discovered [10]. The flavivirus genome is around 10-12 kb in size and has a single Open Reading Frame (ORF) that is translated into a polyprotein processed by cellular and viral proteases. Untranslated regions (UTRs), which are important in the replication process, flank the ORF at the 5 and 3 ends. The 5 end contains a type 1 cap (m7GpppAmG), followed by a conserved stem-loop structure, but 3′ end of the genome terminates with a conserved CUOH rather than a poly(A) tract that recognizes the site for viral RNA-dependent RNA polymerase (RdRp) [11]. Flaviviruses enter cells by endocytosis, which is mediated by mannose receptors, glycosaminoglycans, or DC-SIGN receptors (a C-type lectin receptor present on both macrophage and dendritic cell surfaces) that bind to the envelope protein E6. The low pH environment that characterizes the interior of the endosomal vesicle triggers the fusion of the viral envelope with the vesicle membrane, favoring the removal of the virus coat and the release of the genome into the cytoplasm of the infected cell [12]. Several Flavivirus species are significant human infections that can affect the homeostasis of the central nervous system (CNS). In fact, neurotropic viruses can produce neurological dysfunctions in the infected individual. These neurological conditions can advance and give rise to major inflammatory disorders that alter the CNS architecture and have a poor or even fatal prognosis. Flaviviruses include USUV, WNV, JEV (Japanese encephalitis virus), MBEV (Murray Valley encephalitis virus), and all share the neurotropism necessary to cause acute or enduring infections [13]. The virus leaves peripheral organs between the sixth and eighth day after infection, but due to its ability to cross the blood-brain barrier (BBB), it persists in the brain and spinal cord. There are different hypotheses that support the idea that flaviviruses enter the central nervous system (CNS) via the BBB. These distinct strategies include leukocytes carrying the virus across the BBB, direct virus entry after infection of brain endothelial cells, which results in impaired barrier integrity, and retrograde axonal transport-mediated virus entry after peripheral nervous system infection (i.e., olfactory nerve infection) [14].
The Bunyaviridae family (BUNV) contains several arthropod-borne and rodent-borne viruses; it gets its name from Bunyamwera (Uganda), where the first virus was isolated from mosquitos. These viruses cause febrile diseases in humans and other vertebrates. A rodent host or arthropod vector and a vertebrate host are involved in the life cycle [15]. The majority are arboviruses (arthropod-borne viruses) that are primarily spread by arthropods (mosquitoes, ticks, sandflies). Phylogenetic analyses allowed their classification into four main genera of medical interest (Bunyavirus, Phlebovirus, Nairovirus, Hantavirus), divided into 35 serogroups with more than 300 virus species and strains [16]. The entire genome is 11-12 kb in length and is made up of a single, linear molecule of negative-sense, single-stranded RNA (ssRNA-) divided into small (S), medium (M), and large (L) segments. A nonstructural protein (NSs) and a nucleocapsid (N) are both encoded by the S segment on a single mRNA having overlapping open reading frames (ORFs). The NSm polypeptide and two envelope glycoproteins (G1 and G2) are encoded by the M segment and produced by the cleavage of a single polyprotein. Finally, the RNA-dependent RNA polymerase is encoded by the L segment [17].
Herein, we describe an in silico approach using different bioinformatics tools to analyze the entire genomic sequences of the main Togaviridae, Flaviviridae, and Bunyaviridae associated with encephalopathy and identify the occurrence of specific conserved motifs capable of interacting with host proteins. Different positive and negative single-strand RNA viruses can sequester RNA binding protein (RBP) from host proteins to speed up the replication process. The host cell network could be altered by this depletion, thus interfering with nucleus-cytoplasmic traffic and causing a spatial redistribution of proteins from the nucleus to the cytoplasm.

2. Results

In silico study of the whole genomes of Togaviridae, Flaviviridae, and Bunyaviridae revealed three significant unique motifs. These motifs, which were found in all of the viral strains investigated, are marked with a color code (red, green, and light blue) in Figure 1A and are distinguished by their ability to bind a total of 25 different RBPs. The first motif is present in all the examined strains and likely refers to an ancestral infection mechanism that survived across the species during the evolution. The other two motifs were respectively identified in 80% and 68% of the examined strains (Figure 1B). All the RBPs found in association with conserved RNA motifs are closely related, as shown by STRING analysis (Figure 2A); moreover, the described alterations of these proteins, due to the infective pathogenic processes, are associated with several phenotypic neurological symptoms.

2.1. Analysis of the First Motif (Red Motif)

The analysis revealed that the first motif has a direct interaction with proteins from the poly(A)-binding protein (PABP) family (PABPC1 (Poly(A) Binding Protein Cytoplasmic 1), PABPC4 (Poly(A) Binding Protein Cytoplasmic 4), PABPN1 (poly(A) binding protein nuclear 1), PABPC3 (Poly(A) Binding Protein Cytoplasmic 3)).
Initially, it was considered that the PABPs family protein merely protected the mRNA poly(A) tail. It is now recognised that it has a selective interaction with particular mRNA sequences and plays an important role in the metabolism of distinct mRNAs. PABPs interactions with components involved in several physiological processes, including as mRNA metabolic pathways, polyadenylation/deadenylation, mRNA export, translation, degradation, and expression regulation during development, complicate PABPs function [22]. PABPC1, PABPC4, and PABPN1 can bind the first motif, whereas PABPC3 can bind the third motif.
Through mRNA alternative polyadenylation to the 3-end of RNA, PABPC1 recruits hnRNPLL (heterogeneous nuclear ribonucleoprotein L like), which regulates the conversion of membrane Ig to secreted Ig in B cells [23]. While PABPN1 and PABPC3 are specifically necessary for progressive and efficient polymerization of poly(A) tails and are involved in cytoplasmic regulatory processes of mRNA metabolism, PABPC4 mRNA levels increase during T cell activation and regulate the stability of labile mRNA species [22,24,25].
The motif is also recognized by two SR family splicing factor proteins (SRSF2, SRSF10) that are involved in constitutive and alternative pre-mRNA splicing and are characterized by RNA recognition of arginine/serine–rich (RS) domain [26].
Additionally, the motif also binds to proteins CNOT4 (CCR4-NOT Transcription Complex Subunit 4), HuR (ELAV Like RNA Binding Protein 1, ELAVL1), LIN28A (Lin-28 Homolog A), MATR3 (Matrin 3), PTBP1 (Polypyrimidine tract binding protein) SART3 (Spliceosome Associated Factor 3, U4/U6 Recycling Protein), TIA1 (T-Cell-Restricted Intracellular Antigen-1), U2AF2 (U2 Small Nuclear RNA Auxiliary Factor 2), ESRP2 (Epithelial Splicing Regulatory Protein 2), YBX1 (Y-Box Binding Protein 1), and YBX2 (Y-Box Binding Protein 2).
CNOT4 with insufficient E3 ubiquitin ligase activity has been associated with heart disease showing altered QT interval length. Cases of West Nile Virus (WNV) encephalitis have been associated with cardiac arrhythmias, and with patients with the Ebola virus. The depletion of this protein may be related to the negative effects of several antiviral drugs that can result in torsades de pointes and ventricular fibrillation [27,28].
ELAVL1 is associated with CELF6 (CUGBP Elav-Like Family Member 6, also known as BRUNOL6) bind the second motif, and both proteins are involved in the regulation of alternative splicing. ELAVL1 and CELF6 significantly affect additive control in human pathology due to their potential double depletion, which has a stronger impact than their individual depletions. Particularly, CELF6 depletion results in lower brain serotonin levels, which lead to behavioral abnormalities, and destabilizes synaptic genes through mRNA interactions with 3 UTR elements [29].
LIN28A contributes to the maturation and differentiation of neuronal stem cells. Particularly, destroying dopamine neurons in the substantia nigra, LIN28A deficiency causes developmental defects and Parkinsons disease (PD) [30]. Infection with mosquito-borne alphavirus causes selective loss of dopaminergic neurons, neuroinflammation, and widespread protein aggregation. A variety of viruses have been described with potential for inducing or contributing to the occurrence of parkinsonism and PD [31].
MATR3, which is also bound by the third motif, is a member of a subset of RBPs that has been linked to both sporadic and familial neuromuscular disease as well as to muscular and neurodegenerative diseases like amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) [32].
PTBP1, a member of the PTB family that facilitates IRES-mediated translation and activates the replication-translation switch, is necessary for effective RNA replication. Through their recruitment in regulatory complexes during infection with several coronaviruses (CoV), DENV, and Theilers murine encephalomyelitis virus (TMEV), PTBP1 depletion has recently been linked to idiopathic Parkinsons disease (iPD) [33,34].
SART3 is essential for the stabilization of complexes containing USP15, a protein that regulates NF-B activity by aiming to increase the stability of IκBα (nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha) as well as participates in mRNA metabolism and maintains brain health. These findings suggest that the SART3-USP15 cascade disruption could result in chronic ER stress, which would speed up the neurodegenerative phenotype [35].
The ability of SART3 to bind specifically to pre-miR-34a is also very intriguing. SART3 overexpression resulted in downregulation of the miR-34a target genes CDK4/6 and a G1 phase cell cycle arrest. The RNA-recognition motif identified in SART3 that is specific for pre-miR-34a binding supports the idea that SART3 is important for miR-34a biogenesis and might play a role in the progression of the NSCLC cell cycle. The ability of Dengue viruses to infect and replicate in human primary lung epithelium and different lung cancer cell lines is well known. The infection markedly increased the expression of IL-6 and RANTES, a chemokine mainly released by flow-dependent platelets [36].
TIA-1 protein is a transcription factor used by different viruses to support their own biology. For example, WNV produces its own RNA using TIA proteins. Furthermore, it is well known that Tick-borne encephalitis virus (TBEV) binds to viral replication sites via TIA-1 to regulate viral replication independently of the formation of stress granules (SGs) [37].
U2AF2 is thought to play a role in IL7R exon 6 skipping and changing the distribution of soluble IL7R isoforms on membranes during mRNA splicing and processing. The homeostatic cytokine interleukin-7 (IL-7) and the IL7R complex play a key role in the development and maintenance of T cells. The primary regulatory molecule in the IL7/IL7R axis, as well as its expression, are dynamically regulated during T cell activation and development. IL7R expression disruption contributes to immunopathologies, as demonstrated by severe immunodeficiencies, and loss-of-function variants in humans are strongly associated with risk for Multiple Sclerosis (MS). According to a variety of evidence, early viral infections are crucial for the development of chronic inflammatory, autoimmune, and demyelinating diseases with progressive axonal degeneration that could develop into multiple sclerosis [38].
ESRP2 does not currently appear to be connected to the occurrence of neurological disorders, although its downregulation is connected to invasive head and neck cancer and oral squamous cell carcinogenesis (OSCC). It is well known that some viruses, including the hepatitis C virus, adenoviruses, the herpes group viruses, and the human papillomavirus (HPV), have a strong correlation with oral squamous cell carcinoma [39,40].
YBX1 and YBX2 are the final two RBPs that bind the motif. It is known that the structural protein E interacts with the viral nucleocapsid of the DV through the Y-box, which is necessary for the correct formation of intracellular virus particles and their secretion [41]. YBX2 is expressed specifically in the spermatogonia to spermatocyte stage, testicular germ cells, and oocytes. ZIKV infection is widely recognised as significantly reducing spermatogenesis after producing major physiological, immunological, and endocrine damage in the testes, most likely owing to YBX2 subtraction dysregulation. Male sensitivity to flavivirus infection may be due to YBX2 expression in testicular germ cells, as evidenced by the greater incidence of antibodies in males (32.3%) compared to females [42,43].

2.2. Analysis of the Second Motif (Green Motif)

The motif reveals a strong binding affinity for FMR1 (Fragile X Messenger Ribonucleoprotein 1), HNRNPL (Heterogeneous Nuclear Ribonucleoprotein L), QKI (KH Domain Containing RNA Binding), SFPQ (Splicing Factor Proline and Glutamine Rich), and SNRPA (Small Nuclear Ribonucleoprotein Polypeptide A) proteins, in addition to CELF6 and PABPC3 recognized by the first motive.
Even though its function is still unknown, FMR1 has two KH domains and RGG box conserved in many RNA-binding proteins, suggesting that it is involved in RNA metabolism. It is well known that a concentration decrease of FMR1 is associated with poor learning and memory function, poor motor coordination, and poor sensorimotor adaptation [44].
Previous studies on the Japanese encephalitis virus suggested that the protein HNRNPL and the ribonucleoprotein HNRNPA2 support viral replication through the interaction with viral proteins and RNA. Reciprocal co-immunoprecipitation analyses in transfected and infected cells confirmed a specific interaction between the JEV core protein and the hnRNP proteins [45].
It has been demonstrated that QKI has a variety of functions in the regulation of viral infection, as promoting the expression of Zika proteins and, surprisingly, by inhibiting the replication of a clinical isolate-specific strain of Dengue virus (known as DENV4) through the interaction with a QKI response element (QRE). QKI deficiency increases viral infection through the suppression of the host IFNβ response following the downregulation of the mitochondrial antiviral-signaling protein (MAVS), which is essential for innate immunity response against RNA virus infection [46].
SFPQ, a specific DNA and RNA binding protein, is strictly related to neurodegenerative diseases because of its relationship with FUS protein (the neural homeostasis-binding Fused in sarcoma) (. Frontotemporal lobar degeneration (FTLD) and amyotrophic lateral sclerosis (ALS) are genetically and clinically linked to the disruption of this interaction [47].
The last RBP able to bind the motif is SNRPA, a part of the U1 small nuclear ribonucleoprotein (U1 snRNP) complex involved in the splicing of pre-mRNAs. Even though it does not appear correlated to a neurological defect, SNRPA is highly expressed in lung adenocarcinoma (LUAD) and lung squamous cell carcinoma tissue (LUSC), as well as the progression of gastric cancer (GC) [48]. The possible removal of SNRPA by the green motif could be a therapeutic target for these kinds of tumors.

2.3. Analysis of the Third Motif (Light Blue Motif)

Finally, in addition to MATR3, RBM6 (RNA Binding Motif Protein 6) and RBM24 (RNA Binding Motif Protein 24), two proteins that are both highly expressed in human brain tissue, are able to bind the last motif. While aberrant expression of RBM6 has been involved in the development of human malignancies (i.e., growth and progression in laryngo carcinoma) as reported for SART3, SNRPA, ESRP2, RBM24 plays a specific role in the differentiation of myoblast and into molecular pathways related to the expression of myogenic factors and muscle functional proteins during regeneration [49,50].

2.4. Enriched Analysis

To find further connections between the RBPs involved in specific molecular pathways, additional analysis was carried out. We found the presence of a subcluster that emphasizes how the dysregulation of these proteins can be linked to cancer, especially leukemia, using the STRING function of Kmeans Cluster analysis (Figure 2B) with a number of seven subclusters as a parameter (equivalent to a third of the number of investigated proteins). This evidence was reported in a study of 12,573 dengue patients, where stratified analyses by different follow-up times demonstrated that the risk of leukemia was considerably increased only between 3 and 6 years following dengue virus infection. This finding also supports the evidence that the disregulation of RBM6, SART3, ESRP2, and SNRPA is linked to a number of tumor types.
Finally, the same list of RBPs was also investigated using Enrichr, a software that allows the simultaneous run of multiple searches in various databases. In addition to the expected result confirming the association of the proteins with RNA metabolismas reported in the top 4 terms enriched in the Human KEGG (Kyoto Encyclopedia of Genes and Genomes database) and in the top 10 terms enriched in the GO Biological Process database in DisGeNET (a discovery platform encompassing one of the largest publicly available collections of genes, proteins, and variants relevant to human disease) (Figure 3A,B) RBPs alterations are mainly related to neuropathy and weakening, as shown in the UMAP scatterplot (Figure 2C).
This additional evidence confirms that the host cell network is altered by the depletion or dysregulation of RBPs, which affects nucleus-cytoplasmic traffic and the spatial redistribution of proteins from the nucleus to the cytoplasm, resulting in the onset clinical symptoms, and neurological disorder in particular.

3. Discussion

Infectious diseases continue to rank among the top global causes of illness, disability, and death despite significant advancements in the treatment of viral infections [51]. Numerous factors, including environmental changes, the genetics of the pathogens, and the increased frequency of animal and human movement, which increases the possibility of contact between hosts and potentially host species, can cause contagious diseases to emerge and recur [51]. Anthropogenic alteration of areas with high biodiversity has produced a variety of hotspots where the risk of zoonosis is increasing. These hotspots are caused by the creation of new areas of contact that involve human structures, natural areas, and possibly new infections [51,52]. Data on emerging infectious diseases that have plagued the human population over the last three decades show that 75% are the result of a pathogen being transferred from animals, particularly wild animals, to humans. In this spillover process the pathogen evolves and gains the ability to infect, replicate, and spread across other species, including humans [52,53]. Such situations are made also worse by intensive farming, where high animal density and low genetic diversity create a favorable environment for pathogen spread, resulting in increased interactions between humans, animals, and wildlife, as well as the possibility of breeding farm animals into intermediate hosts, facilitating pathogen transmission [53,54]. Another aspect of zoonoses and infections in general is related to the concept of viral quasispecies, which describes an error-prone replication and demonstrates a sophisticated replication adaptive system in response to environmental stimuli [55]. Similar to how the immune system of vertebrates expands clonally in response to antigenic stimuli, viral quasispecies also benefit from a molecular memory based on the existence of a dynamic population of complex mutant genomes [56]. This determines the coexistence in the host of a primary sequence (dominant nucleotide sequence) and a range of mutant sequences distinguished by the set of copy errors related to the viruss capacity for replication. [57] The highest mutation rate among living species is found in RNA viruses (between 10−3 and 10−5 errors per nucleotide and replication cycle), followed by retroviruses (which have extremely high mutation rates and exist as complex genetically heterogeneous populations) and DNA viruses (10−8 to 10−6 substitutions per replication cycle) [58]. Both the primary sequences and the mutant spectra are extremely short during RNA virus infections because environmental changes or, in the case of SARS-CoV2, the potential use of vaccines directed against a single protein, can frequently upset the population balance of viral genomes [59]. In addition to functioning as an essential adaptive strategy, the genetic organization of quasispecies has a range of biological effects, some of which are directly related to viral persistence but are not always associated with infection [56]. Infection results from an interaction between the virus, the host, and/or the environment, and can take one of two different forms: acute or persistent [60]. The acute infection strategy allows for a transient infection in which the hosts immune response only attends to eliminate or prevent the continuation of the infection in the same host, following the succession of replicative cycles of the virus. In order to continue the infectious cycle, viruses that belong to this category (i.e., influenza, rhinovirus, and SARS-CoV-2) need to find a new host during the short window of replication. Contrarily, virus persistence in a host occurs after an initial phase of replicative infection and the hosts antiviral response, during which the virus continues to have the capacity to replicate continuously or irregularly in the same host for a predetermined amount of time. The host immune response does not completely eradicate these viruses [60]. The ability of the virus to survive the host immune response, enough susceptible cells replicating at the same rate as the virus, and the presence of a latent condition in which the replicative activity of the virus may be partially or completely suppressed for prolonged periods while retaining the ability to reactivate, are all requirements for persistence in an organism [61]. In this scenario of complicated and ongoing adaptive fight between the host and virus, it is critical to develop quick analysis methods that can pinpoint the fundamental causes of infection and implement possible treatments.
The suggested analytical pipeline revealed that the mere presence of the “Togaviridae, Flaviviridae, and Bunyaviridae” genomes in the host cell could predict the depletion of particular RBPs and that the depletion of these protein could change metabolic pathways related to the clinical phenotype. Different positive and negative single-strand RNA viruses can sequester RBPs from host proteins to speed up the replication process, disrupts nucleus-cytoplasmic traffic and leads to a spatial redistribution of proteins from the nucleus to the cytoplasm, altering the host cell network [62,63]. It has been demonstrated via individual RBP analysis that dysregulation is related to clinical manifestations such as neuropathy, weakness, and, in severe cases, encephalitis by infecting host neurons. The severity of the viruss effects depends on its virulence and the maturity of the infected neuron. Additionally, this outcome was attained by utilizing the Enrichr software and a DisGeNET database query, which directly relates the dysregulation of these proteins with clinical manifestations. The investigation also discovered that all strains had the conserved first motif, which is most likely the source of the more sophisticated and ancient molecular mechanism of infection. The three motifs, which have a specific role in the regulation of viral RNA maturation, can be exploited to develop compounds that limit the removal of these RBPs, hence inhibiting infection. We also highlighted that the alteration of these proteins is related to distinct cancers, especially leukemia, due to the subcluster enrichment analysis using STRING. The emergence of malignancies has been linked in the literature to alphaviruses or flaviviruses, e.g., dengue infection.

4. Materials and Methods

The analysis pipeline, which consists of four major steps, was carried out using several online bioinformatics tools.
MEME-ChIP, which performs a comprehensive motif analysis (including motif discovery) on large sets of sequences identified by ChIP-seq or CLIP-seq experiments on Human DNA (http://meme-suite.org/tools/meme-chip), was used to analyze the entire genomic sequences of the main Togaviridae, Flaviviridae, and Bunyaviridae strains. [18].
All identified motifs were used as queries for Tomtom (http://meme-suite.org/doc/tomtom.html), another MEME suite tool that compared the motifs to a database containing a curated and non-redundant collection of experimentally discovered and proven RNA binding site proteins on human genome. Using the Benjamini and Hochberg method, Tomtom calculated the q-value, which is the minimal false discovery rate at which the observed similarity would be considered significant [19]. A list of human RBPs that recognize the common conserved domain distributed on viral genomes were obtained for all motif queries (Figure 1).
STRING (https://string-db.org/), another software tool, was used to identify the most related correlation within the query RBPs set using a guilt-by-association approach. The bioinformatics tool drew on a large database of functional interaction networks from various organisms, and each related RBP can be traced back to the source network that was used to make the prediction. [20].
The entire list of RBPs was used as input for Enrichr (https://maayanlab.cloud/Enrichr/), a web-software application that integrates different method for ranking enriched terms, and various interactive visualization tools to show enrichment results using the JavaScript library “Data Driven Documents (D3)”. Also, the software provides various visual summaries of the collective functions of gene lists. [21]

5. Conclusions

The analysis provides new scenarios for the potential use of these viruses as an alternative therapeutic strategy in cancer patients. It is generally known that in pre-clinical investigations against cancer, recombinant vaccines based on alphaviruses have both preventive and therapeutic efficacy [64]. For instance, the potential barring of the discovered motifs that can bind RBPs could restore the proper availability of the proteins, reducing both the clinical symptoms and, indirectly, the risk of forming tumors. In conclusion, the suggested pipeline is simple to use, and this easily repeatable method could be used not only to comprehend the mechanism of infection and, consequently, the characterization of the RBPs involved, but also to identify metabolic networks suitable to identify tissue specific biomarkers and potential pathway dysfunction, which are helpful for developing a potential vaccine or therapeutic approach. The implementation of these integrated pipelines for analysis, which can be created in a single user-friendly interface, could support infection investigations by offering quick access to information on the analysis of multiple datasets (such as genotype and transcriptome data), particularly during times of emergency. Obviously, interindividual genetic variations may contribute to some of the observed variability in phenotypic responses, but identifying these regulatory factors should improve the diagnostic sensitivity and accuracy of cohort classification and, as a consequence, facilitate therapy.

Author Contributions

Conceptualization, M.C.; methodology, M.C., ALC and MDM; formal analysis, M.C., ALC and AR; writing—original draft preparation, M.C.; writing—review and editing, ND, LM and AR. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Baker, R.E.; Mahmud, A.S.; Miller, I.F.; Rajeev, M.; Rasambainarivo, F.; Rice, B.L.; Takahashi, S.; Tatem, A.J.; Wagner, C.E.; Wang, L.-F.; et al. Infectious disease in an era of global change. Nat. Rev. Microbiol. 2021, 20, 193–205. [Google Scholar] [CrossRef] [PubMed]
  2. Sohrabi, F.; Saeidifard, S.; Ghasemi, M.; Asadishad, T.; Hamidi, S.M.; Hosseini, S.M. Role of plasmonics in detection of deadliest viruses: a review. Eur. Phys. J. Plus 2021, 136, 1–71. [Google Scholar] [CrossRef] [PubMed]
  3. Rossi, B.; Barreca, F.; Benvenuto, D.; Braccialarghe, N.; Campogiani, L.; Lodi, A.; Aguglia, C.; Cavasio, R.A.; Giacalone, M.L.; Kontogiannis, D.; et al. Human Arboviral Infections in Italy: Past, Current, and Future Challenges. Viruses 2023, 15, 368. [Google Scholar] [CrossRef] [PubMed]
  4. Hernandez, R.; Brown, D.T.; Paredes, A. Structural Differences Observed in Arboviruses of the Alphavirus and Flavivirus Genera. Adv. Virol. 2014, 2014, 1–24. [Google Scholar] [CrossRef]
  5. Mori, Y.; Otsuki, N.; Sakata, M.; Okamoto, K. Virology of the Family Togaviridae. Uirusu 2011, 61, 211–220. [Google Scholar] [CrossRef]
  6. Elmasri, Z.; Nasal, B.L.; Jose, J. Alphavirus-Induced Membrane Rearrangements during Replication, Assembly, and Budding. Pathogens 2021, 10, 984. [Google Scholar] [CrossRef]
  7. Rupp, J.C.; Sokoloski, K.J.; Gebhart, N.N.; Hardy, R.W. Alphavirus RNA synthesis and non-structural protein functions. J. Gen. Virol. 2015, 96, 2483–2500. [Google Scholar] [CrossRef]
  8. Azar, S.R.; Campos, R.K.; Bergren, N.A.; Camargos, V.N.; Rossi, S.L. Epidemic Alphaviruses: Ecology, Emergence and Outbreaks. Microorganisms 2020, 8, 1167. [Google Scholar] [CrossRef]
  9. Diosa-Toro M, Urcuqui-Inchima S, Smit JM. Arthropod-borne flaviviruses and RNA interference: seeking new approaches for antiviral therapy. Adv Virus Res. 2013, 85, 91–111. [CrossRef]
  10. Kuno, G.; Chang, G.-J.J.; Tsuchiya, K.R.; Karabatsos, N.; Cropp, C.B. Phylogeny of the Genus Flavivirus. J. Virol. 1998, 72, 73–83. [Google Scholar] [CrossRef]
  11. Ng, W.C.; Soto-Acosta, R.; Bradrick, S.S.; Garcia-Blanco, M.A.; Ooi, E.E. The 5′ and 3′ Untranslated Regions of the Flaviviral Genome. Viruses 2017, 9, 137. [Google Scholar] [CrossRef]
  12. Perera-Lecoin, M.; Meertens, L.; Carnec, X.; Amara, A. Flavivirus Entry Receptors: An Update. Viruses 2013, 6, 69–88. [Google Scholar] [CrossRef]
  13. Amor, S. Virus Infections of the Central Nervous System. Manson’s Tropical Diseases. 2009;853-883. [CrossRef]
  14. de Vries, L.; Harding, A.T. Mechanisms of Neuroinvasion and Neuropathogenesis by Pathologic Flaviviruses. Viruses 2023, 15, 261. [Google Scholar] [CrossRef] [PubMed]
  15. Braack, L.; De Almeida, A.P.G.; Cornel, A.J.; Swanepoel, R.; De Jager, C. Mosquito-borne arboviruses of African origin: review of key viruses and vectors. Parasites Vectors 2018, 11, 1–26. [Google Scholar] [CrossRef]
  16. Labuda, M. Arthropod vectors in the evolution of bunyaviruses. . 1991, 35, 98–105. [Google Scholar] [PubMed]
  17. Liu, J.; Swevers, L.; Kolliopoulou, A.; Smagghe, G. Arboviruses and the Challenge to Establish Systemic and Persistent Infections in Competent Mosquito Vectors: The Interaction With the RNAi Mechanism. Front. Physiol. 2019, 10, 890. [Google Scholar] [CrossRef] [PubMed]
  18. Ma, W.; Noble, W.S.; Bailey, T.L. Motif-based analysis of large nucleotide data sets using MEME-ChIP. Nat. Protoc. 2014, 9, 1428–1450. [Google Scholar] [CrossRef] [PubMed]
  19. Gupta, S.; Stamatoyannopoulos, J.A.; Bailey, T.L.; Noble, W.S. Quantifying similarity between motifs. Genome Biol. 2007, 8, R24. [Google Scholar] [CrossRef]
  20. Szklarczyk, D.; Kirsch, R.; Koutrouli, M.; Nastou, K.; Mehryary, F.; Hachilif, R.; Gable, A.L.; Fang, T.; Doncheva, N.T.; Pyysalo, S.; et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2022, 51, D638–D646. [Google Scholar] [CrossRef] [PubMed]
  21. Xie, Z.; Bailey, A.; Kuleshov, M.V.; Clarke, D.J.B.; Evangelista, J.E.; Jenkins, S.L.; Lachmann, A.; Wojciechowicz, M.L.; Kropiwnicki, E.; Jagodnik, K.M.; et al. Gene Set Knowledge Discovery with Enrichr. Curr. Protoc. 2021, 1, e90. [Google Scholar] [CrossRef]
  22. Goss, D.J.; Kleiman, F.E. Poly(A) binding proteins: are they all created equal? Wiley Interdiscip. Rev. RNA 2012, 4, 167–179. [Google Scholar] [CrossRef] [PubMed]
  23. Peng, Y.; Yuan, J.; Zhang, Z.; Chang, X. Cytoplasmic poly(A)-binding protein 1 (PABPC1) interacts with the RNA-binding protein hnRNPLL and thereby regulates immunoglobulin secretion in plasma cells. J. Biol. Chem. 2017, 292, 12285–12295. [Google Scholar] [CrossRef] [PubMed]
  24. Lemay, J.-F.; Lemieux, C.; St-André, O.; Bachand, F. Crossing the borders: Poly(A)-binding proteins working on both sides of the fence. RNA Biol. 2010, 7, 291–295. [Google Scholar] [CrossRef]
  25. Kini, H.K.; Kong, J.; Liebhaber, S.A. Cytoplasmic Poly(A) Binding Protein C4 Serves a Critical Role in Erythroid Differentiation. Mol. Cell. Biol. 2014, 34, 1300–1309. [Google Scholar] [CrossRef] [PubMed]
  26. Zheng, X.; Peng, Q.; Wang, L.; Zhang, X.; Huang, L.; Wang, J.; Qin, Z. Serine/arginine-rich splicing factors: the bridge linking alternative splicing and cancer. Int. J. Biol. Sci. 2020, 16, 2442–2453. [Google Scholar] [CrossRef]
  27. Elmén, L.; Volpato, C.B.; Kervadec, A.; Pineda, S.; Kalvakuri, S.; Alayari, N.N.; Foco, L.; Pramstaller, P.P.; Ocorr, K.; Rossini, A.; et al. Silencing of CCR4-NOT complex subunits affect heart structure and function. Dis. Model. Mech. 2020, 13. [Google Scholar] [CrossRef]
  28. Ajam, M.; A Abu-Heija, A.; Shokr, M.; Ajam, F.; Saydain, G. Sinus Bradycardia and QT Interval Prolongation in West Nile Virus Encephalitis: A Case Report. Cureus 2019, 11, e3821. [Google Scholar] [CrossRef]
  29. David, G.; Reboutier, D.; Deschamps, S.; Méreau, A.; Taylor, W.; Padilla-Parra, S.; Tramier, M.; Audic, Y.; Paillard, L. The RNA-binding proteins CELF1 and ELAVL1 cooperatively control the alternative splicing of CD44. Biochem. Biophys. Res. Commun. 2022, 626, 79–84. [Google Scholar] [CrossRef]
  30. Wu, K.; Ahmad, T.; Eri, R. LIN28A: A multifunctional versatile molecule with future therapeutic potential. World J. Biol. Chem. 2022, 13, 35–46. [Google Scholar] [CrossRef]
  31. Bantle, C.M.; Phillips, A.T.; Smeyne, R.J.; Rocha, S.M.; Olson, K.E.; Tjalkens, R.B. Infection with mosquito-borne alphavirus induces selective loss of dopaminergic neurons, neuroinflammation and widespread protein aggregation. npj Park. Dis. 2019, 5, 1–15. [Google Scholar] [CrossRef]
  32. Malik, A.M.; Barmada, S.J. Matrin 3 in neuromuscular disease: physiology and pathophysiology. J. Clin. Investig. 2021, 6. [Google Scholar] [CrossRef]
  33. Fan, X.; Zhao, Z.; Ma, L.; Huang, X.; Zhan, Q.; Song, Y. PTBP1 promotes IRES-mediated translation of cyclin B1in cancer. Acta Biochim. et Biophys. Sin. 2022, 54, 696–707. [Google Scholar] [CrossRef]
  34. Vavougios, G.D. SARS-CoV-2 dysregulation of PTBP1 and YWHAE/Z gene expression: A primer of neurodegeneration. Med Hypotheses 2020, 144, 110212–110212. [Google Scholar] [CrossRef] [PubMed]
  35. Long, L.; Thelen, J.P.; Furgason, M.; Haj-Yahya, M.; Brik, A.; Cheng, D.; Peng, J.; Yao, T. The U4/U6 Recycling Factor SART3 Has Histone Chaperone Activity and Associates with USP15 to Regulate H2B Deubiquitination. J. Biol. Chem. 2014, 289, 8916–8930. [Google Scholar] [CrossRef] [PubMed]
  36. Sherman, E.J.; Mitchell, D.C.; Garner, A.L. The RNA-binding protein SART3 promotes miR-34a biogenesis and G1 cell cycle arrest in lung cancer cells. J. Biol. Chem. 2019, 294, 17188–17196. [Google Scholar] [CrossRef] [PubMed]
  37. Velasco, B.R.; Izquierdo, J.M. T-Cell Intracellular Antigen 1-Like Protein in Physiology and Pathology. Int. J. Mol. Sci. 2022, 23, 7836. [Google Scholar] [CrossRef]
  38. Schott, G.; Galarza-Muñoz, G.; Trevino, N.; Chen, X.; Weirauch, M.T.; Gregory, S.G.; Bradrick, S.S.; Garcia-Blanco, M.A. U2AF2 binds IL7R exon 6 ectopically and represses its inclusion. RNA 2021, 27, 571–583. [Google Scholar] [CrossRef]
  39. Ishii, H.; Saitoh, M.; Sakamoto, K.; Kondo, T.; Katoh, R.; Tanaka, S.; Motizuki, M.; Masuyama, K.; Miyazawa, K. Epithelial Splicing Regulatory Proteins 1 (ESRP1) and 2 (ESRP2) Suppress Cancer Cell Motility via Different Mechanisms. J. Biol. Chem. 2014, 289, 27386–27399. [Google Scholar] [CrossRef]
  40. Gupta, K.; Metgud, R. Evidences Suggesting Involvement of Viruses in Oral Squamous Cell Carcinoma. Pathol. Res. Int. 2013, 2013, 1–17. [Google Scholar] [CrossRef]
  41. Diosa-Toro, M.; Kennedy, D.R.; Chuo, V.; Popov, V.L.; Pompon, J.; Garcia-Blanco, M.A. Y-Box Binding Protein 1 Interacts with Dengue Virus Nucleocapsid and Mediates Viral Assembly. Mbio 2022, 13, e0019622. [Google Scholar] [CrossRef]
  42. Aliakbari, F.; Eshghifar, N.; Mirfakhraie, R.; Pourghorban, P.; Azizi, F. Coding and Non-Coding RNAs, as Male Fertility and Infertility Biomarkers. 15. [CrossRef]
  43. Kuassivi, O.N.; Abiven, H.; Satie, A.-P.; Cartron, M.; Mahé, D.; Aubry, F.; Mathieu, R.; Rebours, V.; Le Tortorec, A.; Dejucq-Rainsford, N. Human Testicular Germ Cells, a Reservoir for Zika Virus, Lack Antiviral Response Upon Zika or Poly(I:C) Exposure. Front. Immunol. 2022, 13, 909341. [Google Scholar] [CrossRef] [PubMed]
  44. Adinolfi, S.; Bagni, C.; Musco, G.; Gibson, T.; Mazzarella, L.; Pastore, A. Dissecting FMR1, the protein responsible for fragile X syndrome, in its structural and functional domains. RNA 1999, 5, 1248–1258. [Google Scholar] [CrossRef] [PubMed]
  45. Katoh, H.; Mori, Y.; Kambara, H.; Abe, T.; Fukuhara, T.; Morita, E.; Moriishi, K.; Kamitani, W.; Matsuura, Y. Heterogeneous Nuclear Ribonucleoprotein A2 Participates in the Replication of Japanese Encephalitis Virus through an Interaction with Viral Proteins and RNA. J. Virol. 2011, 85, 10976–10988. [Google Scholar] [CrossRef]
  46. Liao, K.-C.; Chuo, V.; Ng, W.C.; Neo, S.P.; Pompon, J.; Gunaratne, J.; Ooi, E.E.; Garcia-Blanco, M.A. Identification and characterization of host proteins bound to dengue virus 3′ UTR reveal an antiviral role for quaking proteins. RNA 2018, 24, 803–814. [Google Scholar] [CrossRef] [PubMed]
  47. Lim, Y.W.; James, D.; Huang, J.; Lee, M. The Emerging Role of the RNA-Binding Protein SFPQ in Neuronal Function and Neurodegeneration. Int. J. Mol. Sci. 2020, 21, 7151. [Google Scholar] [CrossRef]
  48. Yuan, M.; Yu, C.; Chen, X.; Wu, Y. Investigation on Potential Correlation Between Small Nuclear Ribonucleoprotein Polypeptide A and Lung Cancer. Front. Genet. 2021, 11. [Google Scholar] [CrossRef]
  49. Wang, Q.; Wang, F.; Zhong, W.; Ling, H.; Wang, J.; Cui, J.; Xie, T.; Wen, S.; Chen, J. RNA-binding protein RBM6 as a tumor suppressor gene represses the growth and progression in laryngocarcinoma. Gene 2019, 697, 26–34. [Google Scholar] [CrossRef]
  50. Grifone, R.; Saquet, A.; Desgres, M.; Sangiorgi, C.; Gargano, C.; Li, Z.; Coletti, D.; Shi, D.-L. Rbm24 displays dynamic functions required for myogenic differentiation during muscle regeneration. Sci. Rep. 2021, 11, 1–15. [Google Scholar] [CrossRef]
  51. Baker, R.E.; Mahmud, A.S.; Miller, I.F.; Rajeev, M.; Rasambainarivo, F.; Rice, B.L.; Takahashi, S.; Tatem, A.J.; Wagner, C.E.; Wang, L.-F.; et al. Infectious disease in an era of global change. Nat. Rev. Microbiol. 2021, 20, 193–205. [Google Scholar] [CrossRef]
  52. Hassell, J.M.; Begon, M.; Ward, M.J.; Fèvre, E.M. Urbanization and Disease Emergence: Dynamics at the Wildlife–Livestock–Human Interface. Trends Ecol. Evol. 2016, 32, 55–67. [Google Scholar] [CrossRef]
  53. Ellwanger, J.H.; Chies, J.A.B. Zoonotic spillover: Understanding basic aspects for better prevention. Genet. Mol. Biol. 2021, 44, e20200355. [Google Scholar] [CrossRef]
  54. Dafale, N.A.; Srivastava, S.; Purohit, H.J. Zoonosis: An Emerging Link to Antibiotic Resistance Under “One Health Approach”. Indian J. Microbiol. 2020, 60, 139–152. [Google Scholar] [CrossRef]
  55. Singh, K.; Mehta, D.; Dumka, S.; Chauhan, A.S.; Kumar, S. Quasispecies Nature of RNA Viruses: Lessons from the Past. Vaccines 2023, 11, 308. [Google Scholar] [CrossRef]
  56. Domingo, E.; Sheldon, J.; Perales, C. Viral Quasispecies Evolution. Microbiol. Mol. Biol. Rev. 2012, 76, 159–216. [Google Scholar] [CrossRef]
  57. Domingo E, Escarmís C, Menéndez-Arias L, et al. Viral Quasispecies: Dynamics, Interactions, and Pathogenesis. Origin and Evolution of Viruses. 2008;87-118. [CrossRef]
  58. Peck, K.M.; Lauring, A.S. Complexities of Viral Mutation Rates. J. Virol. 2018, 92. [Google Scholar] [CrossRef]
  59. Almehdi, A.M.; Khoder, G.; Alchakee, A.S.; Alsayyid, A.T.; Sarg, N.H.; Soliman, S.S.M. SARS-CoV-2 spike protein: pathogenesis, vaccines, and potential therapies. Infection 2021, 49, 855–876. [Google Scholar] [CrossRef] [PubMed]
  60. Pesti, R.; Kontra, L.; Paul, K.; Vass, I.; Csorba, T.; Havelda, Z.; Várallyay. Differential gene expression and physiological changes during acute or persistent plant virus interactions may contribute to viral symptom differences. PLOS ONE 2019, 14, e0216618. [Google Scholar] [CrossRef] [PubMed]
  61. E Randall, R.; E Griffin, D. Within host RNA virus persistence: mechanisms and consequences. Curr. Opin. Virol. 2017, 23, 35–42. [Google Scholar] [CrossRef] [PubMed]
  62. Gilbertson, S.; Federspiel, J.D.; Hartenian, E.; Cristea, I.M.; Glaunsinger, B. Changes in mRNA abundance drive shuttling of RNA binding proteins, linking cytoplasmic RNA degradation to transcription. eLife 2018, 7. [Google Scholar] [CrossRef] [PubMed]
  63. Chetta, M.; Tarsitano, M.; Oro, M.; Rivieccio, M.; Bukvic, N. An in silico pipeline approach uncovers a potentially intricate network involving spike SARS-CoV-2 RNA, RNA vaccines, host RNA-binding proteins (RBPs), and host miRNAs at the cellular level. J. Genet. Eng. Biotechnol. 2022, 20, 1–11. [Google Scholar] [CrossRef]
  64. Lundstrom, K. Alphaviruses in Immunotherapy and Anticancer Therapy. Biomedicines 2022, 10, 2263. [Google Scholar] [CrossRef] [PubMed]
Figure 1. (A) The conserved sequences that were found by comparing all of the strains are included in this figure. All consensus motifs are reported using “motif logos” and according to IUPAC nomenclature. Moreover, a complete list of RBPs has been supplied. (B) RBPs binding site distribution on arbovirus sequences analyzed. The colors red, green, and light blue correspond to the motifs shown in a).
Figure 1. (A) The conserved sequences that were found by comparing all of the strains are included in this figure. All consensus motifs are reported using “motif logos” and according to IUPAC nomenclature. Moreover, a complete list of RBPs has been supplied. (B) RBPs binding site distribution on arbovirus sequences analyzed. The colors red, green, and light blue correspond to the motifs shown in a).
Preprints 71599 g001
Figure 2. (A) The graph displays the connection between all known RBPs, which are grouped by the motif to which they can bind. (B) The set of proteins in the network is provided in the table under “Count in Network.” Strength (observed/expected: Log10) indicates the size of the enrichment effect. It is the ratio between the number of proteins in query that are annotated with a term and the number of proteins we would anticipate in a randomly generated network of the same size. False Discovery Rate indicates the importance of the enrichment. The Benjamini-Hochberg method is used to adjust the p-values for multiple testing within each category and is shown.
Figure 2. (A) The graph displays the connection between all known RBPs, which are grouped by the motif to which they can bind. (B) The set of proteins in the network is provided in the table under “Count in Network.” Strength (observed/expected: Log10) indicates the size of the enrichment effect. It is the ratio between the number of proteins in query that are annotated with a term and the number of proteins we would anticipate in a randomly generated network of the same size. False Discovery Rate indicates the importance of the enrichment. The Benjamini-Hochberg method is used to adjust the p-values for multiple testing within each category and is shown.
Preprints 71599 g002
Figure 3. (A). The bar chart shows the top 4 enriched terms in KEGG_2021 library, along with their corresponding p-values. Colored bars correspond to terms with significant p-values (<0.05). An asterisk (*) next to a p-value indicates the term also has a significant adjusted p-value (<0.05). (B) The bar chart of the top 10 enriched terms in GO-biological process library (C) UMAP scatterplot enriched analysis for the DisGeNET database. A library term is represented by each point. The gene set associated with each word had its term frequency-inverse document frequency (TF-IDF) values calculated, and the resulting values were then subjected to UMAP. The first two UMAP dimensions are used to map the terms. Term positions tend to be closer together for gene sets that are more similar. The TF-IDF data are transformed into automatically selected clusters using the Leiden technique. The query is more profoundly enhanced the darker and bigger the point.
Figure 3. (A). The bar chart shows the top 4 enriched terms in KEGG_2021 library, along with their corresponding p-values. Colored bars correspond to terms with significant p-values (<0.05). An asterisk (*) next to a p-value indicates the term also has a significant adjusted p-value (<0.05). (B) The bar chart of the top 10 enriched terms in GO-biological process library (C) UMAP scatterplot enriched analysis for the DisGeNET database. A library term is represented by each point. The gene set associated with each word had its term frequency-inverse document frequency (TF-IDF) values calculated, and the resulting values were then subjected to UMAP. The first two UMAP dimensions are used to map the terms. Term positions tend to be closer together for gene sets that are more similar. The TF-IDF data are transformed into automatically selected clusters using the Leiden technique. The query is more profoundly enhanced the darker and bigger the point.
Preprints 71599 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated