1. Introduction
In a recent article [
1],
starting from experimental data in the literature on the liver of patients with
covid, we discriminated, through interactomic analysis and reverse engineering,
626 human proteins that interact physically and functionally with SARS-CoV-2
proteins. These interactions are specific to the liver, despite the broad
organotropism of the virus [
2]. What surprised
us was that the S1 subunit of Spike can directly act with the human proteome,
through one-to-one interactions, after its uptake on the cell surface. S1 has
also been found in far organs [
2,
3,
4] for
lengthy periods [
5,6], and its interaction
with single human proteins seems organ-specific. After all, there are many
reports on the long permanence of the virus [
7]
and there are also recent confirmations that even concern sperm cells [
8]. The virus affects sperm count and gamete
quality even after 3 months and although the virus is not present in the
infected men’s semen; it is intracellularly present in spermatozoa. Although
this is a small study, it supports the many reports that the virus also affects
distant organs such as the reproductive system [
9].
We also highlighted that the modes of interaction
between viral and human proteins are of two types [
1,
10].
Most times, groups of different viral proteins attack single human proteins, or
single proteins of molecular complexes. Unfortunately, the general lack of
space-temporal information, i.e., where this phenomenon occurs and with what
chronology severely limits description and identification of the molecular
mechanisms involved in these multi-to-one processes [
10].
It is important to note that while the viral
proteins involved in these one-to-one interactions also take part in
multi-to-one attacks together with other viral proteins, human proteins
involved in one-to-one reactions are specifically involved only in this type of
interaction. Among the various viral proteins involved in this peculiar
interaction, we noted that in liver, the Spike S1 subunit protein interacted
specifically with 12 human proteins, but we did not investigate further.
However, this finding makes this set of viral and human proteins valuable
because the information that is drawn is specific and reflects the viral
strategy. In another our article [
11],
starting from 146 proteins highly significant for physical interactions
experimentally proven with the Spike S1 subunit protein, through interactomics
and reverse engineering, we isolated 27 human proteins that interact one-to-one
with S1. These interactions are not because of a specific organ but appear to
be widespread in the human organism.
The S1 protein is peculiar because researchers have
found it free for long periods in the human organism, both during and after
covid, but also after vaccination [
12,
13,
14,
15]. The
mRNA of this protein is used to prepare vaccines, and even if modified, it
accesses the same cellular mechanisms that the SARS genome accesses after its
entry, producing the same protein with the same molecular process. Therefore,
this study focused on the autonomous activities of S1 that occur in the liver,
even because the protein appears to be involved in molecular processes related
to HBV and HCC. We have used the 12 human proteins interacting with S1 in the
liver as seed to extract their functional relationships by proteome enrichment.
While the interactome of the other 27 human proteins, also found interacting
with S1, was used to compare results. Results surprisingly showed that S1
induced the expression of many human proteins underlying the molecular
processes involving HBV and HCC. These two diseases show some molecular
processes in common. Evolutionary considerations led us to think that, although
the viral strategy induces the expression of the molecular components of these
two pathologies, it is the set of biological interactions with the individual
human phenotypes that favors, or hinders, their progression. All this also
suggests how to use these additional aspects.
2. Materials and Methods
2.1. BioGRID
BioGRID (
https://thebiogrid.org/) is an important
biomedical database that collects curated protein and genetic interactions,
only from experimental studies and living cells [
16].
Therefore, it represents a fundamental and unique resource for obtaining data
on certified functional interactions in biological contexts. Through a specific
Project (BioGRID COVID-19 Coronavirus Curation Project), BioGRID maintains
complete and continuous coverage of protein interaction data between human
proteins and all SARS-CoV-2 proteins. The Project is still active
(
https://thebiogrid.org/project/3) and provides comprehensive datasets of
curated direct interactions for the viral proteins encoded by SARS-CoV-2. We
accessed the area SARS-CoV-2 Protein Interactions in October 2024.
BioGRID manages and integrates interaction data
from low- and high-workflow experiments through a data curation and
standardization process. This involves the analysis and validation of data from
both types of experiments to ensure the quality and reliability of the
information in the database.
2.2. STRING
STRING (Search Tool for the Retrieval of
Interacting Genes/Proteins database) (
https://string-db.org/) Version 12.0 is a
database of predicted interactions for different organisms [
17,
18]. The interactions include direct (physical)
and indirect (functional) associations; they stem from computational
prediction, from knowledge transfer between organisms, and from interactions
aggregated from other (primary) databases. STRING is a database of known and
predicted protein-protein interactions. The interactions include direct
(physical) and indirect (functional) associations; they stem from computational
prediction, from knowledge transfer between organisms, and from interactions
aggregated from other (primary) databases. It considers conserved genomic
neighborhood, gene fusion events and co-occurrence of genes across genomes, as
well as information about orthologs. STRING quantifies the strength of the
evidence supporting each interaction by assigning it a confidence score. This
score is a combination of several sub-scores (based on seven channels of
evidence), each of which is calculated in a personalized and source-specific
way.
2.3. Protein Enrichment
It relies to some extent on prior knowledge, and
statistical enrichment of annotated features may not be an intrinsic property
of the input. To get a statistically valid enrichment test from STRING, we
input the entire set of enriched proteins into STRING, ensuring that “first
shell” and “second shell” are both set to “none”. To confirm the correctness of
the procedure, we also checked the STRING annotation, which disappears when the
analysis is performed correctly. Next, we introduce new interaction partners to
the network to expand the interaction neighborhood according to the desired
confidence score. We used 0.9 as the confidence score. We always added 1st
order proteins (direct interactions) first and then 2nd order proteins
(indirect interactions), when necessary.
2.4. Cytoscape and Network Topology Analysis
Cytoscape [
19,
20]
through Network Analyzer was used to analyze the topological parameters of
networks. Using Cytoscape software (Version 3.10.1), we visualized and analyzed
PPI networks, which offer diverse plugins for multiple analyses. Cytoscape
represents PPI networks as graphs with nodes illustrating proteins and edges
depicting associated inter-actions. We examined network architecture for
topological parameters such as clustering coefficient, centralization, density,
network diameter, and so on. Our analysis included undirected edges for every
network. We termed the number of connected neighbors of a node in a network as
the degree of a node. P(k) is used to describe distributing node degrees, which
counts the number of nodes with degree k where k = 0, 1, 2, …. We calculated
the power law of distribution of node degrees, which is one of the most crucial
network topological characteristics. The coefficient R-squared value (R2), also
known as the coefficient of determination, gives the proportion of variability
in the dataset. We also examined other network parameters, including the
distribution of various topological features. We performed a calculation of
high ranked nodes based on relevant topological parameters [
21].
2.5. Highlighting the Nodes of a STRING Network Involved in the Same Biological Process (GO)
STRING makes visible all the nodes involved in the
same biological process evidenced through its databases mapped onto the
proteins (GO, KEGG, REACTOME, and so on) by activating the process itself with
a click of the cursor on the process line. Activation means that all nodes
involved in the same metabolic process have the same color. Nodes involved in
multiple processes receive multiple colors. This tool is very useful when one
wants to analyze involving multiple nodes in many metabolic processes, distinguishing
the effect of different processes between nodes, and identifying which nodes
represent the crossing points. If individual nodes do not show any coloration
after clicking, this identifies certain components of a path, or group, that a
specific activated process does not influence. The relationships that determine
the coloring of the nodes depend on the knowledge base that STRING organizes
for a specific network by extracting data and information from the scientific
literature in PubMed.
2.6. Enrichment Analysis
When using STRING for enrichment analysis
(Biological Process (GO), KEGG Pathways and Reactome Pathways), changing the
independent variable to either gene count or signal strength can lead to
differences because each parameter emphasizes distinct aspects of the data.
Gene Count: Using gene count as the main parameter
emphasizes the quantity of genes involved in each process. This approach often
identifies broad processes or pathways that involve many genes. Higher gene
counts can sometimes show more general biological processes or pathways, as
they involve multiple players to cover broader functional areas.
Signal Strength: signal strength highlights
processes based on the "weight" or impact of the connections within
the network, often considering the interaction confidence and association
strength. This parameter is likely to favor pathways or processes where
interactions are more robustly supported by data, even if they involve fewer
genes. Signal-based analyses can thus highlight more specialized pathways or
those with high relevance because of strong interactions. Because each approach
weights aspects of the data differently, the resulting enriched terms may vary
despite both analyses being statistically significant. This divergence often
highlights how certain biological processes can be more statistically
associated either with a higher number of contributing genes or with highly
specific, intense interactions.
The analyses were set according to the following
parameters: Similarity: > 0.9; Maximum FD shown: 0.0001; Minimum count in
the network: 3; Minimum intensity shown: 0.75; Minimum signal shown: 0.5. All
functions shown by STRING are significant, having a p-value < 1x10−27.
2.7. Data Merging
Data Merging is a process in data management, used
to coalesce multiple related datasets into one. The data-merging approach pools
all data together and then estimates statistics on the resulting dataset of GO
terms. The merging process enables the use of this combined data for more
effective analysis, particularly for extensive sets [
22].
Data Merging merges disparate data sources, such as databases, or experiments
data, into a unified dataset. We have used Excel for calculations. It aids in
improving the accuracy of statistical data analysis, filling missing values in
datasets, identifying correlations between variables, and making the data
cleaning process more efficient. This procedure also presents some challenges.
These include handling large datasets, ensuring the correct alignment of merged
data, and dealing with ambiguities when datasets have similar identifiers.
These issues, if not dealt with carefully, can lead to data inconsistency or
incorrect data interpretation. We have used this approach in integrating
diverse data from various interactome analyses and data sources. The performance
depends on the size of the datasets being merged and the computational
resources available. With adequate resources, it is usually efficient and
quick, providing a unified data view in relatively little time. We have used a
storage repository that holds a vast amount of raw data in its native format
(Data Lake).
3. Results
3.1. Starting Conditions
For this study, we used both the 12 human proteins
got from our previous liver paper [
1] and, as
a reference, the interactomic data of the 27 human proteins got from the recent
whole-organism S1 study [
11]. These human
proteins have in common the one-to-one interaction with S1. All these
interactions are not speculative, because they are based on physical
interactions got from experimental studies, including cryo-EM structures,
co-immunoprecipitation assays and functional validation, which revealed the
full set of virus-host interactions. Curators collected these interactions in
vivo from various model cell systems and visualized them individually in
BioGRID (https://thebiogrid.org/search.php?search=SARS-CoV-2*&organism=2697049).
The S1 dataset is part of the “BioGRID COVID-19 Coronavirus Curation Project -
Severe acute breathing syndrome coronavirus 2”, which covers all viral proteins
(32 results), each organized by significance levels. The full set covers 41,683
protein-protein physical interactions (October 2024), where the S1 protein
alone shows 3,840 interactions with 2,012 specific interactors and 41 PTM
sites. These characteristics give the S1 protein an enormous capacity for
interaction within the human proteome.
The comparison (
Table 1S in Supplements) shows that only 4 of them are in common between the two sets. Both the numerical difference and the few proteins in common suggest that the two sets derive from different metabolic contexts. This is characteristic of the wide variety of metabolic scale relationships that exist in a complex metabolic system such as the human organism, where each relationship depends on the metabolic context in which events occur [
23]. We used this set of 12 proteins as a functional seed to extract from the human proteome the relationships in which they are involved.
The aim of this study is to find out whether S1 can reproduce the same conditions found for free S1 also in the liver, regarding the molecular components that characterize the pathological processes of HBV and HCC. The liver is the major organ where both pathologies occur. We want to find out whether both pathologies coexist with an independent development or whether there is an overlap of similar molecular components that would lead to the development of only one of them.
3.1.1. Interactome-12
Figure 1 reports the interactome resulting from the enrichment of the 12 human proteins that interact specifically with S1 in the liver.
Only ten of the twelve proteins are involved in the interactome, which from now on we will call interactome-12. We pruned the two non-interacting proteins (S100A8 and TMPRSS2). However, the relatively low number of total nodes (676) compared to an enrichment of 500 first order plus 500 second order proteins is a concrete sign that not all interactions have a robust experimental basis to ensure the reliability of their role in cellular functions [
24,
25]. This prompted us to increase the statistical significance of the results by operating on the enrichment parameters setting FDR: <0.0001 and strength: 0.75. This action reduces the number of insignificant and nonspecific terms in protein-protein interactions [
26]. While in the last years the mention of PPIs as a topic in scientific articles has increased exponentially, the number of articles presenting real demonstrations of interactions using appropriate technologies has increased very little and only linearly [
27].
3.1.2. Main Features of the Interactome-12
Excel file 1 shows node-degrees and PRKACB with 122 links is the main hub. According to Barabasi et al., [
21,
28,
29], we can consider as the highest-ranking nodes or Hub-nodes about fifteen nodes, degrading up to 70-60 links. They are all kinases of various families. The scope of action of PRKACB is functionally very broad. Its activation regulates diverse cellular processes (cell proliferation, cell cycle, differentiation, and regulation of microtubule dynamics), chromatin condensation and decondensation, nuclear envelope disassembly, and reassembly, as well as regulation of intracellular transport mechanisms and ion flux [
30]. So, we can find this protein (cAMP-dependent protein kinase catalytic subunit beta) in the cytoplasm, cell membrane, and nucleus. These are all signs of extensive metabolic activity (
https://www.proteinatlas.org/ENSG00000142875-PRKACB). We can say the same about the various families of kinases that appear as high-ranking nodes of interactome-12. This interactome follows a scale free power law (
figure 1S, Supplements).
3.1.3. Analysis of KEGG Terms hsa 05161-Hepatitis B and hsa 05225-Hepatocellular Carcinoma
The category expressing KEGG terms shows some of them involved in HBV and HCC. These terms are KEGG specific. However, these terms appeared also in the liver interactome [
1]. HBV and HCC signals were already present in the KEGG terms of liver interactome, but the primary aim of that paper was to define the biological validity of that interactome and its topological features. In the present results, HBV (hsa05161 Hepatitis B; 90 of 158 network genes; strength: 1.22; signal: 6.91; FDR: 2.37x10-67) and HCC (hsa05225 Hepatocellular carcinoma; 54 of 161 network genes; strength: 0.99; signal: 3.69; FDR: 1.38x10-31) involve a high overall number of genes (144), when compared to the total number of genes in the interactome (676). 31 of these genes are in common and represent 57.4% of those of HCC and 34.4% of HBV.
Table 2S (Supplements) shows these genes. It is difficult to make predictions on their physio-pathological trends because both terms are highly significant even if the hepatitis genes seem more many. This significant overlap between HBV and HCC-related genes suggests that both processes share molecular pathways. The asymmetry in the overlap (57.4% HCC genes common with HBV vs. 30% HBV genes common with HCC) points to the possibility that HCC may leverage molecular mechanisms initially triggered by HBV, but with additional factors specific to cancer progression. These topics are of extraordinary importance because of their intrinsic and multiple implications on the health of infected people [
31,
32,
33,
34,
35,
36,
37] and for the long-term consequences after covid [
38,
39,
40,
41].
Several possibilities could explain these results:
Common Pathways to Divergent Outcomes: HBV and HCC may share early molecular triggers, particularly related to inflammation, immune evasion, or cell survival [
42]. However, HCC would require additional oncogenic events (mutations, dysregulated signaling) that go beyond the viral impact, resulting in its independent progression.
Staged Evolution of Disease: It's possible that HBV creates a favorable environment for HCC development [
43,
44], with S1 inducing early changes that lead to hepatitis but also laying the groundwork for carcinogenesis in susceptible cells. The shared genes might represent pathways involved in liver damage, inflammation, and immune signaling that predispose cells to oncogenic transformation.
Independent Evolution of Overlapping Pathways: Though HBV and HCC share pathways, they may develop independently once started [
45,
46]. HBV may follow a chronic inflammatory or immune-evasion route, while HCC could progress through mutations and other cancer-related alterations, despite initial similarity in gene expression patterns.
Both KEGG pathways involve immune modulation, apoptosis, and cell proliferation. We find genes like TP53 [
47], RB1 [
48], and STAT3 [
49] that are crucial in both inflammation (hepatitis) and cancer progression. But we also find oncogenes and tumor suppressors like CTNNB1, GADD45A/B, and EGFR [
50], which are more linked to cell proliferation, mutation repair failure, and cancer progression. But we also find genes like TLR3, IFNA1, or JAK1 [
51] that are heavily involved in immune response pathways specifically for HBV. They reflect roles in antiviral response, immune system activation, or liver-specific inflammation.
Figures 2S (Supplements) show the impact of nodes related to HBV and HCC in the interactome-12. The nodes are a large and integral part of the central interactomic core. Many are involved in both pathologies. This makes any analysis very complex because of the many overlaps that generate an intricate metabolic system. The interactome-814 is shown in
Figure 3S and table 3S similarly to the presentation of the interactome-12. The tables show both interactomes display overall quite similar genes. Even the genes shared between the two different interactomes are similar.
By comparing the functional results of these interactomes for HBV and HCC genes, we will probably be able to gain a clearer picture of whether both processes develop independently or whether one prevails. In conclusion, considering the genes involved in HBV and HCC of both interactomes, we see quite similar results, although with some differences. There is no typical data that can reliably suggest the possible evolution of the molecular processes involved in one or both pathologies.
3.2. Comparisons Between Enrichment Analysis Terms
To clarify these results, in
Table 4S (Supplements) we show a comparison between the enrichment analyses got from the three (GO) terms and the KEGG term of each interactome. The number of genes is a macroscopic functional parameter that could identify the general trend of the two pathologies without bringing into play similar specific processes. By restricting some enrichment parameters, the analysis achieved a greater significance in the results, even if it eliminated some processes. In addition, we used the number of genes contributing to the same terms as the abscissa. To get this result, we used the new semantic analysis function of STRING (similarity > 0.8), which groups all the genes of the many processes that perform a similar functional activity. Human cellular systems often operate in a coordinated manner in the 3D cellular space, to manage similar metabolic actions in various tissues, essential for their physiological function [
52].
The comparison of the three GO terms still shows similar biological, molecular and cellular actions for the two interactomes. However, this was to be expected since the human cellular systems of the various tissues perform quite similar metabolic/functional management actions. The higher total number of its nodes can explain an average difference of around 15% in the number of genes involved in the various processes in favor of the interactome-814 (814 nodes vs. 676).
The comparison between KEGG pathways highlights the strong and significant presence of tumor gene pathways that in this analysis appear both in terms of number of genes involved and in statistical terms preponderant compared to that expressed by HBV. It is important to note that the semantic-based analysis groups similar functionally genes scattered in each single term, thus highlighting the most important functions. During the reworking phase of the interactomes (pruning), and in the phase of functional analyses for categories, the operations of changing the enrichment display settings (eliminating those with lower values) decreased the number of nodes, as well as interactions, and terms in the categories, while increasing the overall significance. We therefore conclude that based on these interactomic models the processes related to HCC seem to be preponderant and probably with a greater propensity to progression compared to those of HBV. In fact, both the PI3K-AKT signaling pathway and MAPK signaling, involved in cancer progression, are significantly present. However, HBV is also based on the involvement of many genes (over 100) with an appreciable statistic (FDR < 1.0x10−69) that strongly reduces its distance from HCC but does not exclude it.
In conclusion, the functional analyses attempted so far show unclear results because of the intrinsic complexity of the interacting systems, making it difficult to discriminate between overlapping processes and components involved. Probably a way to have clear and reliable answers is not to look at the components at play in the various molecular processes, but to look at the final actions that these pathologies can determine, the various cell death that hepatitis and cancer involve.
3.3. Analysis of the Cell Death Present in the Interactomes Examined
One way to distinguish which pathophysiological process is prevalent is by evaluating the dominant modes of cell death and their implications. Cell death can occur through various pathways, each with a distinct pathophysiological significance. Each mode of cell death plays a unique role in health and disease. Understanding these mechanisms can provide insights into various pathological conditions but also help us understand the viral strategy. The molecular complexity of cell death in HCC and HBV infection is partly common (apoptosis) but often accompanied by different and well-differentiated molecular events [
53,
54]. The expected Type of Cell Death for HCC are Apoptosis, Necrosis, and autophagy [
55,
56], while for HBV infection we should expect apoptosis, necroptosis, and pyroptosis as types of cell death [
57,
58]. If one pathway is compromised, the pathways tightly cross-regulate apoptosis, necrosis, and pyroptosis, leading to coordinated cell death [
59,
60,
61,
62]. Sometimes, HCC cells may activate autophagy as a survival mechanism to manage stress and nutrient deprivation. Therefore, the ability of HCC to evade apoptosis and regulate autophagy can contribute to cancer development [
63]. Tumors can also experience necrosis because of rapid growth outpacing blood supply, leading to ischemia and cell death. Necrosis leads to nonspecific systemic inflammation [
64]. Although in cancer genetic alterations play an important role in the virus-host interaction, metabolism and immunology determine the fate of the viral action, because the viral strategy depends on the host and its phenotype [
65,
66]. With HBV infection, hepatocytes can undergo apoptosis, often mediated by the immune response as the body attempts to clear the virus [
67]. This is the common response in acute HBV. If apoptosis is inhibited, causing inflammatory responses and liver damage, necroptosis may also occur in cases of chronic HBV infection [
68]. Separately, we must consider anoikis, a crucial biological process that counteracts cancer metastasis, since it promotes apoptosis of cells that detach from the extracellular matrix [
69]. Tumor cells employ mechanisms to overcome anoikis, promoting invasiveness and metastasis. These mechanisms may include cellular acidosis and stromal changes [
70]. However, anoikis plays a crucial role in cancer progression, as tumor cells employ various strategies to circumvent this form of cell death and gain metastatic capability [
71].
Figures 4S and 5S and Tables 5S and 6S (Supplements) show both interactomes with the proteins involved in the various programmed death implemented by tissue cellular processes following the action of S1. The analysis shows a variety of statistically significant cell deaths, although not yet attributable with certainty to a single pathology. Surprisingly, the interactome-12 shows a richer presence of terms related to cell death processes, mainly among Biological Processes (GO). Many of them have a relation to autophagy. The two networks show, in color, the genes involved, which are almost all in the central core where the main metabolic activities take place, and the speed of reactions is greatest.
These representations are still at a rather high level to draw reliable conclusions. Both HBV and HCC involve a high number of genes, and significant gene redundancy is expected. In fact, both networks show many single nodes involved in multiple cell death processes, a phenomenon that produces gene redundancy.
3.4. Data Merging
The two interactomes originate from different metabolic contexts and, as discussed, show different functional enrichments for each category. Previous analyses have shown that cell deaths essentially concentrate their effects on three categories: Biological Processes (GO), KEGG pathways, and Reactome Pathways. However, multiple and overlapping interactions within each interactome generate complex regulatory nets, where some genes can directly or indirectly influence many other biological processes. We compared these categories of the two interactomes through a Data Merging approach (details in Methods), combining the different datasets into one (see Excel file 2, sheets 1, 2, and 3). Data Merging is used to evaluate interaction parameters, add observations, and find repetitions. Therefore, the merging is the logic we used to distinguish common processes (coupled processes) from individual processes (uncoupled processes) of each interactome within examined categories. Merging optimizes the process by collecting all the data into a single set, maximizing the ability to extract and analyze critical information with thoroughness.
The Excel file 2 shows the merge among Biological Processes, KEGG Pathways, and Reactomes of the two interactomes. This file displays the terms of the tree categories in three different sheets. Data are all highly significant because of the parametric restrictions applied. The various sheets report for each category all the genes involved in the individual terms. Apoptosis is the prevalent programmed death of all cell types but accompanied by abundant phenomena of cell necrosis and autophagy, which are characteristic of cell death by cancer. Since both apoptosis and autophagy can play a dual role in cancer, either promoting or inhibiting tumor progression, understanding their co-occurrence with necrosis could reveal important insights into disease mechanisms. This approach could help clarify whether the observed processes are distinct or part of a continuum that contributes to a single pathology. Considering the interaction between these different cell deaths could provide valuable insights into how the S1 subunit affects liver cells and could guide therapeutic strategies. In both interactomes, only one KEGG term (hsa04217 Necroptosis) refers to necroptosis but with numerically limited genes that are also in common with other processes (they have multiple staining). Referring specifically to the liver, the total number of genes involved in the various cell deaths is high (963 genes), but with many redundancies. Therefore, we eliminated the redundancies, getting 220 genes that should be those specifically controlling the various cell death in liver tissue.
3.5. Genes That Control Cell Death in the Liver
The Data Merging approach allowed us to isolate 220 genes (see
table 7S, Supplements) specifically involved in the control and implementation of how the liver manages the death of its cells during covid. The genes that program cancer death appear to be more many than those of HBV. However, even if with variable and different relationships, genes of both diseases always appear to be reciprocally involved.
Figure 2 shows the interactome calculated by STRING for these genes. The interactome is statistically very reliable, with a significant number of interactions, but it appears not very compact. Despite being well connected, gene sub-domains are clearly distinguished. Three nodes were not connected, so we pruned them. These general characteristics seem to further support the coexistence of the two pathological states in liver cells under the action of S1. However, we analyzed Biological Processes, KEGG Pathways, and Reactome pathways also for this network.
When using STRING for enrichment analysis, changing the independent variable to either gene count or signal strength can lead to differences because each parameter emphasizes distinct aspects of the data. Using gene count as the main parameter emphasizes the quantity of genes involved in each process. This approach often identifies broad processes or pathways that involve many genes. Signal strength highlights processes based on the "weight" of the connections within the network, considering a weighted average between the interaction confidence and association strength. This parameter is likely to favor processes where interactions are more robustly supported by data, even if they involve fewer genes. Signal-based analyses can thus highlight more specialized pathways or those with high relevance because of strong interactions. Because each approach weights aspects of the data differently, the resulting enriched terms may vary despite both analyses being statistically significant. This divergence often highlights how certain biological processes can be more statistically associated either with a higher number of contributing genes or with highly specific, intense interactions [
72,
73,
74].
This dual approach allows for a more comprehensive understanding of the underlying biology. Table 2 shows the comparison among Biological Process (GO), KEGG Pathways, and Reactome Pathways.
Table 2. - Comparison among enriched terms of the 220 nodes interactome.
The terms being compared are: Biological Process (GO), KEGG Pathways, Reactome Pathways.
Analysis parameters: Similarity: > 0.9; Maximum FDR shown: 0.0001; Minimum count in network: 3; Minimum strength shown: 0.75; Minimum signal shown: 0.5.
The insights gained from gene count and signal strength enrichments provide a nuanced perspective, as each metric captures different facets of interactions within our datasets. We compared the enriched term lists from the analyses to identify processes that appear in both gene count-based and signal-based approaches, but also to identify overlapping processes. We found no overlap between processes, but there are processes that are broadly involved (as observed by gene count) and strongly supported by evidence for interaction. But there are also unique terms that appear in only one analysis (as observed by signal). Terms identified by gene count alone may represent core processes where many genes are loosely involved, providing broader functional support [
75,
76]. These may reveal more general biological environments or contexts. Terms identified by signal strength alone are more likely to show specific, potentially critical, high-confidence pathways with strong involvement [
77,
78,
79]. They may reflect highly active or essential mechanisms. However, this integrated approach not only provides us with a clear picture of the essential processes for both pathogeneses observed in SARS-CoV-2-related liver diseases, but also confirms that these molecular components always coexist in any situation. Indeed, whatever the approach used, the genes and proteins responsible for HBV and HCC were always present, all well identifiable with high significance, even if with variable ratio depending on the analysis. All this suggests that traditional mechanistic models may not fully understand very complex interplays. This shows a need for a more integrative perspective that considers a broader biological context and is no longer linked to a mechanistic vision through the calculation of mathematical models represented by networks. Therefore, the need arises for these phenomena to be represented and understood in a much broader context of biological interactions, whose explanation should come from different approaches, as we will try to explain in the discussion.
4. Discussion
These results, taken together, show that molecular processes involving both HBV and hepatocellular carcinoma are consistently. and significantly, present in all approaches used. Many components (genes or proteins) of these pathological processes are common to both diseases, other components are specific. However, they represent most of the interactomes that we have considered. This issue is important because clinicians themselves state that although the attack of SARS2 on the liver causes non-serious symptoms, there have been many reported cases in the literature of patients who fell ill with HCC after having had covid [
40,
46,
66]. Many hypotheses have emerged in these cases as researchers search for the causes. However, we believe it is important to frame clearly what an interactomic approach means and what information it can provide. This helps to explain the results.
Our evaluations made through interactomics are “aseptic” because we use interaction data obtained from experiments conducted in different laboratories on various cellular models (BioGRID). In these experiments, researchers extract single molecules from ground cells, establish direct physical interactions, purify them, and then characterize them. With them, we biologically validated each interaction studied. Although the BioGRID experimental data derive from both infected and uninfected model systems, they are “aseptic” references because they lack molecular data from actual patient cases. Therefore, any conclusions drawn from interactomic data are potential when applied to nonspecific clinical scenarios. While some anomalies can come from the functional data, collected with various methodologies, curated and archived in databases [
80,
81,
82]. The certainty of the information connected to the archived interactions is the basis of protein networks and is crucial for having reliable and real results [
1,
83,
84]. But this is not always the case, so, by implementing parametric pruning, we can manage and reduce these anomalies, resulting in a much more accurate and significant yield and a decrease in the total number of interactions. Using parametric pruning to handle data anomalies is crucial, as it ensures more reliable interaction networks and reduces noise.
It is not appropriate to compare interactomics results with macroscopic evidence such as laboratory tests or patient symptoms, since the microscopic molecular world and the meso-macroscopic world do not have linear correlation but multi-to-one, or multi-multi correlations [
85,
86,
87,
88]. This renders all the discussions that could be made between the complex intersection of viral pathogenesis, cellular mechanisms, and disease processes at least questionable [
87]. If components related to both pathologies appear in the interactome, we cannot attribute them to actual cases or make hypothetical speculations. Instead, we must explain them solely at the molecular level because they detect them through similar molecular mechanisms. The System Biology approach shows that a single molecule can take part in dozens of different processes [
88,
89,
90,
91,
92]. Only the numerical completeness of the components of the process and its significance identify which process that specific molecule belongs to [
1,
11]. All this excludes from the interactome analyses the hypotheses that a person with previous liver diseases can develop towards cancer, or vice versa, simply because we have not analyzed molecular data of patients of this type. The absence of specific molecular data from COVID patients in our analyses prevents us from discussing the relevance of this or that pathological event. The interactome merely describes, aseptically, the potentialities that virus can express in human phenotypes. However, the complexity of disease progression, especially with chronic infections like HBV, may require a more integrative approach that considers both molecular and clinical data when establishing connections. Our results show that S1 protein of SARS-CoV-2 induces microscopic conditions in the liver that may develop in various ways according to the specific physio-pathological state of each phenotype. The relationships that are drawn are correlative and not necessarily causal. Our results show the potential scenario, but the observed associations require more direct evidence to assert causality in a specific clinical scenario, such as patient genomics.
The interactions we studied derive from controlled in vivo studies in different cellular models, without direct links to specific clinical phenotypes. Model cell systems mimic an organism but are not the organism [
93,
94,
95,
96]. This reinforces the idea that molecular interactions potentially reflect mechanistic processes rather than direct clinical outcomes. The multi-to-one or multi-multi correlations between molecular mechanisms and phenotypic expressions further complicate the assumption that molecular evidence can directly explain macroscopic disease processes like the progression from hepatitis to HCC.
In short, SARS-CoV-2 S1 protein may induce microscopic conditions that develop differently based on individual phenotypic states is in line with the complexity of viral pathogenesis. This reflects the non-deterministic nature of interactome data—capturing potentialities rather than predicting phenotypic outcomes. We propose an approach to interpreting molecular data that avoids overgeneralization in linking it to clinical disease progression. We could use our molecular results as a gold standard against molecular data from specific patients, to identify whether the metabolic system of the infected patient is implementing those molecular mechanisms that drive the typical effects of a HBV infection, or even the progression to cancer. These processes should not be present in a healthy person. But they could also be a clinically useful signal of the level of severity reached by the viral infection in a patient with previous morbidity.
We can now discuss in more detail some aspects of our results through our interpretation of the interactomic approach used by exploring the different levels at which molecular variability could influence the results.
1. Phenotypic Heterogeneity in Liver Cells. Even without pre-existing liver diseases, subtle variations in liver cell signaling pathways or receptor expression could lead to diverse responses to the S1 protein. For instance, different tissues and even liver cell types variably express ACE2, a key receptor involved in SARS-CoV-2 entry [
97]. If liver cells from two individuals express ACE2 at differing levels, the downstream signaling pathways activated by S1 binding will differ, leading to heterogeneity in the subsequent cellular response (e.g., stress, apoptosis, or immune modulation).
2. S1's Influence on Liver-Specific Signaling Pathways. The S1 protein might activate or inhibit liver-specific pathways [
98]. For example: 1) JAK/STAT Pathway: This pathway plays a role in inflammation and cell growth. If S1 interacts with this pathway, it could lead to either a protective immune response or a maladaptive one, depending on how the individual's cellular environment modulates these signals. 2) MAPK Pathway: MAPK is crucial in regulating cell differentiation, proliferation, and stress responses. Variability in how liver cells manage oxidative stress, especially in the presence of external stressors (e.g., same S1 protein), could lead to apoptosis in some cells or survival in others, thus influencing outcomes like fibrosis or the progression toward HCC.
3. Metabolic Variations. SARS-CoV-2 influences host metabolism [
99] as well factors highly individualized influence liver metabolism, such as diet, alcohol consumption, and metabolic disorders as non-alcoholic fatty liver disease [
100]. If S1 protein influences metabolic pathways (e.g., by inducing oxidative stress or mitochondrial dysfunction), these pathways may respond differently depending on the metabolic state of the liver cells. Therefore, a person with a predisposition to oxidative stress may experience more severe mitochondrial dysfunction, leading to increased cell damage or a pro-fibrotic environment that could predispose them to HCC.
4. Epigenetic Modifications. Epigenetics plays a critical role in regulating gene expression in response to environmental stimuli. The chronic exposure to stressors (e.g., inflammation, viral infections) can lead to epigenetic changes in liver cells, altering their response to future stimuli [
101]. If the S1 protein induces a specific molecular process (e.g., a stress response or an inflammatory reaction), cells with pre-existing epigenetic marks may amplify or dampen this response. This could explain why some individuals experience more severe liver pathologies in SARS-CoV-2 infection, while others do not. Some key genes from the 220 gene list can be epigenetically modified and play roles in HBV and HCC.
Table 3 shows the key genes that undergo epigenetic phenomena among both the genes involved in HCC and HBV. From the table, we can appreciate the high epigenetic potential present in the set of 200 genes, both for HBV and HCC.
5. Cytokine Profiles and Immune Response. Cytokines, which can dictate the type and severity of immune responses, heavily influence the liver [
126]. IL-6, TNF-α, and TGF-β are examples of cytokines that mediate inflammation, fibrogenesis, and carcinogenesis. The S1 protein can provoke immune dysregulation, potentially altering the cytokine environment. In individuals with a predisposition to an exaggerated cytokine response [
127,
128], this could lead to excessive fibrosis or even cellular transformation. In others, a more regulated response might limit tissue damage and maintain liver homeostasis.
6. Pre-existing Cellular Microenvironments. Even without prior liver disease, individual variability in cellular microenvironments (e.g., hypoxia, local inflammatory status) could affect how liver cells respond to S1-induced signaling. Hypoxia-inducible factors (HIFs), for example, are transcription factors that respond to low oxygen levels and play a role in cellular metabolism and survival [
129]. If S1 alters cellular oxygen consumption or influences HIFs, it could exacerbate or mitigate processes like fibrosis or apoptosis, depending on the pre-existing microenvironment.
7. Autophagy and Apoptosis Pathways. The S1 protein might interfere with cellular mechanisms that regulate autophagy (a process that recycles damaged cellular components) and apoptosis [
130,
131]. Variations in the efficiency of these pathways in different individuals can lead to different outcomes: A) Efficient autophagy could protect cells by clearing damaged components, preventing fibrosis or cancer development. B) Dysregulated autophagy might contribute to the accumulation of damaged proteins or organelles, promoting liver cell death or fibrosis, particularly in individuals with pre-existing metabolic imbalances.
8. Cross-talk with Hepatic Stellate Cells. Hepatic stellate cells (HSCs) are central to liver fibrosis. If S1-induced signaling pathways lead to HSC activation, this could trigger a fibrotic response [
132]. However, the threshold for HSC activation varies from individual to individual based on previous exposure of the liver to fibrotic stimuli. In those with a "primed" fibrotic response (because of genetic predispositions or environmental factors), the fibrosis triggered by S1 may be more severe, potentially setting the stage for HCC.
9. MicroRNA (miRNA) Regulation. miRNAs are small, non-coding RNAs that regulate gene expression post-transcriptionally and are involved in processes like cell growth, differentiation, and apoptosis [
133,
134]. The S1 protein might influence miRNA profiles, leading to differential gene regulation. For instance, changes in miR-122, which is liver-specific and plays a role in liver homeostasis, could cause altered cellular responses that favor fibrosis or tumorigenesis, depending on the individual’s miRNA regulatory network.
5. Conclusions: Integrating These Mechanisms
The interplay between these pathways shows that S1-induced molecular changes do not operate in isolation. Each individual's phenotypic state, shaped by genetic, metabolic, and environmental factors, will dictate how these molecular changes manifest at the cellular and tissue levels. For example, in a healthy liver with efficient autophagy and low baseline inflammation, S1 might induce transient stress that resolves without long-term damage without activating molecular processes capable of developing. Or even, in a liver with a history of subclinical inflammation or metabolic stress, S1 may exacerbate fibrosis or activate pre-cancerous pathways, leading to disease progression.
While the S1 protein can start certain molecular processes across liver phenotypes, the extent and type of pathological outcome depend on the individual's specific molecular landscape. These molecular interactions represent a set of potentialities, with the eventual phenotypic manifestation, depending on how individual liver cells and the broader tissue environment interpret and respond to these signals.
From all these considerations, other reflections arise. The same viral stimulus of S1 protein can lead to widely different outcomes in patients because of the heterogeneity in molecular and phenotypic responses. While some individuals may experience mild or reversible liver damage, others could progress to more severe conditions, like fibrosis or even hepatocellular carcinoma (HCC). The variability in immune response, genetic predisposition, and metabolic status within the population means that some individuals may recover with minimal liver damage. But others, especially those with underlying conditions (e.g., metabolic syndrome, mild hepatic inflammation), may face a greater risk of fibrosis or cancer. All this strongly supports the idea of using interactomic standards as a reference. But it also underscores the challenge of making population-wide predictions based on molecular data alone, as individual responses to the same viral insult can vary dramatically.
Given the differences in how individuals' molecular pathways respond to viral stimuli, stratifying patients into risk categories based on their genetic, metabolic, and inflammatory profiles becomes essential. For example, High-risk patients (those with metabolic dysfunction, chronic inflammation, or pre-existing liver stress) may require closer monitoring for signs of fibrosis or liver damage during and after COVID-19. While low-risk patients (those with healthy liver function and no metabolic dysregulation) might have a lower likelihood of developing serious liver-related complications. This suggests that implementing targeted screening and intervention strategies, focusing on those more likely to experience severe liver outcomes, is more effective than applying broad measures to the entire population.
As we emphasized earlier, there is no linear correlation between the molecular changes induced by SARS-CoV-2 and the clinical manifestations seen in patients. Therefore, clinicians should be cautious in making direct clinical inferences. For example, observing an upregulation of fibrosis-related pathways in molecular studies does not imply that every patient will develop fibrosis. Instead, medical professionals should use a combination of molecular findings, patient history, and real-time clinical data, such as liver function tests, imaging, and biopsy results, to inform clinical decisions but also longitudinal studies that collect molecular data from patients.
These considerations may also flow into different sectors with Public Health Implications that are outside the scope of this article. However, future research should focus on identifying biomarkers that can predict which patients are most likely to develop liver complications post-COVID. Clinical trials should explore interventions aimed at finding out specific molecular pathways activated by the S1 protein, particularly in high-risk groups, but also of other viral proteins capable of one-to-one interactions. Even studying how long-term liver outcomes vary across different demographic and genetic populations should provide more precise data for guiding clinical practice by personalized care. These are the activities we must aim for if we want to activate a personalized molecular medicine.