Introduction
The ongoing coronavirus pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), results in coronavirus disease 19 (COVID-19) [
1,
2,
3], that manifests with a wide range of symptoms ranging from asymptomatic to critical. Patients present with symptoms such as fever, cough, fatigue, headache, hemoptysis, diarrhea, dyspnea, lymphocytopenia, pneumonia, acute respiratory distress syndrome, and acute cardiac injury, and present with specific radiological findings of ground-glass opacities. Severe cases require hospitalization and may lead to death [
4].
Being an RNA virus, SARS-CoV-2 continuously evolves both by mutation and recombination during replication of the genome, and since its first detection, several lineages, sub-lineages, and variants have evolved. Despite some variants evading immunity from previous infection or vaccination, the variability in clinical manifestation (from asymptomatic to severe) of infection cannot be explained by the viral variability alone, and several human host factors, such as age, sex, and various comorbidities are now known to play a role [
5,
6]. Similarly, the immunological effects and their role in disease pathogenesis have now also been well characterized [
7].
However, the underlying cause remains to be determined, and severe manifestations of infection, including death in otherwise healthy middle-aged individuals, suggest that genetic factors may play a role [
8]. Indeed, because of genetics’ potential importance in disease prognosis, prevention, and public health planning measures, the field of COVID-19-host genetics has expanded rapidly resulting in several thousand publications on this topic in the last four years.
The results of this research have been more complex. Following early reports on the association between blood groups and occurrence of infection [
9,
10], and initial examination of a handful of candidate genes, based on their association with other viral interactions, such as
ACE2, CLEC4M, MBL, ACE, CD209, FCER2, OAS-1, TLR4, and
TNF-α [
11,
12,
13], in the years since, several hundred genes have been implicated to play a role in the complex host genetic contribution to COVID-19. Genome-wide association studies (GWAS) have shown that many common genomic variants are enriched in cohorts of patients with severe COVID-19 [
14,
15]. So far, the identified polymorphisms show geographical differences, and were mostly shown to have weak effects and even combining them to assess their polygenic risk score (PRS) so far has not been able to effectively predict disease outcome. Similarly, while it has been proposed that the accumulation of weak effects of many rare functional variants may contribute to the overall risk in patients with severe disease [
16,
17], the information on such variants is scarce in many populations, including ours.
Therefore, we aimed to determine rare genomic variants and their burden in a comprehensive set of evidence-based genes in well-characterized patients hospitalized due to COVID-19 in Slovenia.
Results
We sequenced the whole genomes of 60 patients, hospitalized during the second pandemic wave of COVID-19 in Slovenia [
7] to identify variants of interest in 517 evidence-based genes associated with severe COVID-19. The identified variants were classified according to ACMG Criteria [
18] and the model of inheritance. The resulting pathogenic/likely pathogenic variants and risk factors, as well as their Slovenian genomic database (SGDB) background prevalence, are given in
Table 1.
In total, we identified variants of interest (pathogenic/likely pathogenic variants or risk factors) in 52 of the 60 hospitalized patients. Of the 52 patients with variants of interest, 25 (48%) presented with a single variant, and 27 (52%) presented with more than one variant (
Figure 1), highlighting the complex genetics of immunological response to viral infections and the difficulty in assigning causality.
Despite including hundreds of genes in our evidence-based panel, by adhering to strict ACMG criteria, we finally classified a total of 32 variants as pathogenic/likely pathogenic and 4 as risk factors, in 27 of the genes included in the analysis (
Table 1). No variants of interest were identified in the
LZTFL1 gene, other than the two previously reported GWAS-associated variants rs17713054 and rs35482426 (
Table 1). The estimated burden of all identified pathogenic/likely pathogenic variants of interest in our COVID-19 hospitalized patients compared to the control Slovenian population was statistically significant (p=2.8x10
-5).
Of these 36 variants of interest, 9 were previously reported as pathogenic/likely pathogenic in the ClinVar database [
19], while 17 were reported with conflicting interpretations (unrelated to COVID-19) that included at least one pathogenic/likely pathogenic report. A further 6 of the variants were unclassified in the ClinVar database and were classified as likely pathogenic for the first time in this study. Of the 35 patients with pathogenic/likely pathogenic variants, variants in 10 patients were consistent with the proposed inheritance model, while 25 patients were heterozygous for variants originally associated with an autosomal-recessive disease.
The 7 out of 32 identified pathogenic/likely pathogenic variants that fit the inheritance model proposed for their respective 5 genes, were identified in
CFTR (two variants in possible compound heterozygosity),
MASP2 (one homozygous occurrence)
, MEFV,
RNASEL, and
TNFRSF13B. Pathogenic/likely pathogenic variants consistent with the proposed inheritance model were identified in 10 patients in total (
Table 1,
Table 2).
Interestingly, out of the 7 patients who died, 3 patients had pathogenic/likely pathogenic variants, consistent with predicted inheritance, vs. 7 in 53 patients who survived severe COVID-19; however, this was not statistically significant. Two of the deceased patients had likely pathogenic variants in the MEFV gene, and one patient had a homozygous pathogenic variant in MASP2. Of the remaining 4 patients who died, one had a risk factor in MBL2 gene and one patient in PRF1 gene, one had a pathogenic variant in AR gene IL36RN, and no variants of interest were identified in one of the deceased patients.
Of the remaining 7 patients with pathogenic/likely pathogenic variants who were hospitalized but survived, four had likely pathogenic variants in
MEFV, one was compound heterozygous for pathogenic/likely pathogenic variants in
CFTR, and one each had a pathogenic variant in
RNASEL and
TNFRSF13B respectively. Interestingly, most had additional risk factors (
Table 2). Indeed, most patients had several variants of interest (
Figure 1), but interestingly, the absence of such variants of interest did not show a statistically significant association with survival (χ2 test p==0.937137).
35 of the 60 patients carried more than 48 risk factor variants in
MBL2 (34),
PRF1 (7), and
APOE (7). The
LZTFL1 risk factors rs17713054 and rs35482426 were present in a total of 16 patients, and were the only finding in two patients (
Table 2).
In addition to strong variants of interest, we have also identified more than 300 variants of uncertain significance in more than 200 genes from our curated evidence-based panel (
Supplementary Table 2). These variants include those already classified as variants of uncertain significance in the ClinVar database or classified as such in this study due to being ultra-rare and having at least a moderate coding impact but lacking other criteria for pathogenicity. Most of these variants were missense variants that would need a functional assessment to reach a pathogenicity classification. Their significance remains to be determined in further studies.
Discussion
The significant effort invested in studying COVID-19-host genetics by research groups worldwide has resulted in numerous insights into the pathogenesis of the disease from different fields, from genetics to immunology, highlighting the complexity of virus-host interactions. It is now known that COVID-19-host genetics is very complex and that disease progression and outcome may correlate with more than one gene or genomic region. Indeed, it is estimated that many hundreds of genes are involved and have a direct or indirect effect on the clinical course of COVID-19 infection. In our study design, we have therefore made a significant effort to include in our analysis all genes for which the highest level of evidence was gathered.
Indeed, a total of 7 pathogenic/likely pathogenic variants consistent with the proposed inheritance model were identified in 10 patients in total, in 5 genes: CFTR (two variants in possible compound heterozygosity), MASP2 (one homozygous occurrence), MEFV, RNASEL, and TNFRSF13B.
The
CFTR gene product, cystic fibrosis transmembrane conductance regulator, is an ATP-binding cassette transporter that functions as a ligand-gated anion channel involved in epithelial ion transport. Biallelic pathogenic variants in the
CFTR gene cause cystic fibrosis [
20], manifesting in chronic bronchopulmonary dysfunction. In our cohort, one patient was found to be a possible compound heterozygote for likely pathogenic variants in the
CFTR gene, while 5 patients were found to be carriers of pathogenic/likely pathogenic variants in the heterozygous state. Of note, the intronic
CFTR:c.1210-11T>G variant identified in our study, while considered pathogenic, is known to only be so in compound heterozygous state with severe pathogenic
CFTR variants, while homozygous individuals are asymptomatic [
21]. However, apart from this variant, carriers of single cystic fibrosis-causing variants of the
CFTR gene were previously also shown to be more susceptible to the severe form of COVID-19 [
22].
The
MASP2 gene encodes for a mannan-binding lectin serine protease. While biallelic pathogenic variants in this gene are an established cause for autosomal recessive MASP2 deficiency, manifesting with increased susceptibility to infection due to the defective activation of the complement system, the MASP2 deficiency is common, and many individuals are asymptomatic [
23,
24]. In our cohort, we identified one likely pathogenic variant in
MASP2 in 8 patients, in 7 patients in heterozygous form and in one patient in homozygous form.
Pathogenic and likely pathogenic heterozygous variants in the
MEFV gene, which encodes an innate immune sensor, lead to the production of inflammatory mediators during infection [
25], and are an established cause for autosomal dominant familial Mediterranean fever. Interestingly, it was shown that infected Mediterranean fever patients do not have a worse outcome of SARS-CoV-2 infection when already hospitalized [
26].
A pathogenic variant in the
RNASEL gene, encoding a 2-5A-dependent RNase, was found in one patient.
RNASEL gene product is involved in the general antiviral interferon response [
27] and shows an antiviral role against the Dengue virus [
28].
Finally, a pathogenic heterozygous variant in the
TNFRSF13B gene, whose product plays a crucial role in humoral immunity [
29], was identified in one of our patients. Pathogenic variants in the
TNFRSF13B gene are known to lead to common variable immunodeficiency with either autosomal dominant or recessive inheritance model and were previously identified in isolated cases of patients with severe COVID-19 [
30,
31].
While the gene-associated phenotype and the proposed inheritance model were both considered as parts of the ACMG criteria, it is important to note that the association of a particular variant with SARS-CoV-2 infection outcome may not follow the same inheritance model as the original association of the gene with another disease in which this gene is involved. Therefore, although the 26 heterozygous variants in
ADAR, AIRE, ATM, BRCA2, C2, C6, C7, C8B, C9, CFD, CFI, CFTR, CYBA, FANCC, HAVCR2, HPS4, IL36RN, MASP2, PGM3, POLR3A and
RNU4ATAC genes cannot be automatically considered to have an effect due to their otherwise autosomal-recessive association with their respective diseases, their contribution to SARS-CoV-2 infection severity would need to be functionally assessed or validated by additional studies. Until such studies provide a final confirmation, their contribution to the severe manifestation of COVID-19 remains unknown (
Table 1,
Table 2). For example, it is known that pathogenic heterozygous variants in the
CFTR gene lead to increased risk for chronic pancreatitis, atypical mycobacterial infections, and bronchiectasis [
32], and similar may be shown in the future for COVID-19 also.
Also in line with previous research, which has shown that many different genetic factors can concurrently contribute to the course of COVID-19 infection [
16,
17], in our study we have also identified more than one pathogenic/likely pathogenic variant or risk factor in almost half of the patient cohort. In particular, pathogenic/likely pathogenic variants that would not be sufficient for clinical manifestation according to the proposed inheritance model were found in 42% of patients, which raises an interesting question on their biological relevance
. Indeed, similarly to other groups, we observed a clear burden of rare genomic variants classified as pathogenic/likely pathogenic in genes involved in immunity and host defense, autoinflammation, and autoimmunity were enriched in the selected patient cohort with severe infection outcomes compared to the Slovenian population [
33].
Additionally, we identified risk factor variants in 58% (35/60) of our patients, while they were the sole finding in 28% (17/60) of our patient cohort. A total of 4 risk factor variants were identified in the following 3 genes; APOE, MBL2, and PRF1.
The
APOE4 risk factor (variant
APOE4:c.466T>C) was identified in 7 patients (12%). This risk factor is generally associated with hyperlipoproteinemia, cardiovascular disease, and Alzheimer's disease and has shown an association with COVID-19 disease by several independent studies [
34,
35,
36], however as hyperlipoproteinemia, cardiovascular disease and dementia independently predict a worse outcome, the mechanism remains elusive. In our study, one patient was found to be a homozygote for the APOE4 allele (variant c.466T>C, p.Cys156Arg).
Two common
MBL2 risk factors, increasing COVID-19 susceptibility due to MBL deficiency [
37,
38,
39], were identified in 32 patients combined (53%). Although MBL deficiency is considered as an autosomal dominant disease, similar to MASP2 deficiency, it is a common deficiency, and patients may remain asymptomatic. In two of these patients, both variants,
MBL2 c.161G>A (rs1800450) and c.154C>T (rs5030737) located in trans, were detected, while two patients were found to be homozygotes for the c.161G>A variant.
Finally, the risk factor
PRF1:c.272C>T variant (rs35947132) was detected in 7 patients (12%).
PRF1 gene encodes Perforin-1, a pore-forming protein homologous to complement component C9 with a similar mechanism of transmembrane channel formation [
40]. While pathogenic variants in the
PRF1 gene are associated with aplastic anemia, autosomal recessive familial hemophagocytic lymphohistiocytosis, and non-Hodgkin lymphoma, the c.272C>T risk factor variant (rs35947132) was previously also found to be enriched in several different COVID-19 patient cohorts [
41,
42,
43].
Interestingly, we did not identify any variants of interest in the
LZTFL1 gene that is located within the 3p21.31 risk “Neanderthal” haplotype, identified as important by large GWAS studies [
14,
44], and carried by approximately 16% of Europeans [
45]. Therefore, we have also examined the frequency of the two previously reported intergenic variants surrounding the
LZTFL1 gene, rs17713054 [
14,
44] and rs35482426 [
14] in the patient cohort with that of the Slovenian population. There was no difference in the observed allelic frequency of rs17713054 intergenic variant and the rs35482426 intronic between our patient cohort and the general Slovenian population (
Table 1).
While our patient cohort was heterogeneous with high comorbidities, this is a typical scenario observed in clinical practice when hospitalization is required for COVID-19. In such patients, by using an evidence-based gene panel and ACMG criteria for classification, pathogenic/likely pathogenic genomic variants were identified in 17% (10/60) of the patients. The presence of pathogenic/likely pathogenic variants consistent with the inheritance model was statistically significantly related to a further negative outcome. However, patients with no identified variants were not less likely to die, limiting our ability to draw any conclusion based on this small number of patients.
Limitations
Our work has the following inherent limitations, which are shared with other similar studies involving control and patient group composition, sample size, and limitations of variant classification. The control group consisted of individuals with unknown COVID-19 infection outcomes. Despite initially aiming to include a control group with mild or absent infection outcome, during the study, the occurrence of multiple infections as well as different vaccination statuses prevented us from obtaining such a group (i.e. vaccinated individuals may have a mild phenotype despite carrying a genetic variant conferring risk for a sever COVID-19 outcome). Similarly, to avoid any inherent bias, the patients hospitalized due to COVID-19 were selected in a blinded fashion regarding their immunological status (as determined in the original immunological study to which they were recruited). Despite these measures, the relatively small sample size of the severe COVID-19 cohort limits our ability to detect all rare variants in our population. Furthermore, the ACMG classification criteria limited our ability to classify variants as pathogenic to mostly exonic variants, where their pathogenicity could be assessed based on their predicted outcome on the protein level without further functional studies. Finally, since we did not perform phasing of variants, compound heterozygosity of the variants in genes with the proposed autosomal recessive inheritance model is assumed but needs additional confirmation. Altogether, genomic analysis has identified pathogenic/likely pathogenic variants in CFTR, MASP2, MEFV, RNASEL, and TNFRSF13B genes in 10 patients included in this study. As these genes are associated with immunodeficiency, susceptibility to infections, and inflammatory disorders, and were consistent with the proposed inheritance model for the gene, these variants could likely be major contributors to the severity of SARS-CoV-2 infection in case of these 10 patients. On the other hand, more than one-third of patients carried a single pathogenic/likely pathogenic variant in at least one COVID-19-associated gene with the proposed autosomal recessive inheritance model. For these variants, there is a possibility that they are additionally contributing to the severe condition of patients with COVID-19, which is a multifactorial disease. While it would be interesting to investigate the pathogenicity of intronic and other non-coding variants in genes with autosomal recessive inheritance model in patients where already one pathogenic/likely pathogenic variant or variant of uncertain significance was identified, this is unfortunately beyond the scope of our current research.