1. Introduction
Noroviruses (NoVs) are a major cause of acute gastroenteritis (AGE) worldwide. NoVs belong to the family
Caliciviridae and are non-enveloped viruses with a positive-sense single-stranded RNA genome (7.3 to 7.5 kb), organized into three open reading frames (ORFs). ORF1 encodes a large non-structural polyprotein that includes the viral RNA-dependent RNA polymerase (RdRp), whilst ORF2 encodes the major capsid protein (VP1) and ORF3 encodes the minor capsid protein (VP2) [
1]. NoVs were first identified upon electron microscopy observation of stools of patients involved in an outbreak of AGE at an elementary school in Norwalk, Ohio, USA in 1968 and the causative agent was therefore named Norwalk virus [
2]. With the development of specific molecular assays in the 1990s the role of NoV as causative agent of AGE has been clarified [
3]. NoVs are responsible for more than 90% of non-bacterial AGE epidemics worldwide and are considered the first or second most important cause of diarrhoea in children, along with rotaviruses [
4,
5]. NoVs have been estimated to cause around 1.1 million hospitalizations and up to 200,000 deaths per year, mostly in children less than 5 years of age in developing countries [
4,
5]. The genetic diversity of NoVs is a challenge for diagnostics, for classification, and for the development of vaccines. Based on the complete amino acid (aa) sequence of VP1, NoVs are classified into ten genogroups (GI-GX) and 49 capsid (Cap) genotypes. Based on partial nucleotide (nt) sequences of the polymerase region, 60 confirmed polymerase (Pol) types have also been described [
6]. The majority of NoV strains associated with disease in humans belong to genogroups GI and GII, and are further classified into more than 40 human genotypes [
7]. Multiple NoV genotypes co-circulate in human populations but GII genotype 4 (GII.4) has been associated with most (>80%) outbreaks and sporadic cases of gastroenteritis in both developed and developing countries [
8,
9,
10]. Since the mid-1990s, six global epidemics of NoV GII.4 have been documented and each has been associated with periodic emergence of novel GII.4 variants at intervals of 3-4 years. The pandemic GII.4 variants include US95_96, which emerged in the late 1990s [
11,
12], followed by Farmington Hills 2002 in 2002 [
13,
14], Hunter 2004 in 2004 [
15], Den Haag 2006b in 2007 [
16,
17], New Orleans 2009 in 2009 [
18] and finally by Sydney 2012 in 2011-12 [
19,
20]. It has been proposed that new pandemic GII.4 NoV variants generally evolve through the acquisition of residue substitutions in the capsid protein VP1 that alter the antigenicity enabling evasion of host population immunity [
19,
21,
22,
23,
24,
25,
26] and/or modify affinity to histo-blood group antigens (HBGA) receptor [
27]. In addition to the pandemic GII.4 variants of global relevance, minor GII.4 variants have been described in epidemics restricted to specific geographical regions, namely the variant Asia 2003 [
28], Yerseke_2006a [
16], Osaka 2007 [
29] and Apeldoorn 2008 [
30]. NoV genotyping has been complicated by the emergence of recombinant strains that have polymerase and capsid regions derived from separate ancestral strains [
20]. The global molecular epidemiology of emerging GII.4 strains is largely based on data from outbreak surveillance programmes, that have been enacted worldwide in the 2000s and 2010s. Improvements in diagnostics, with the development and large adoption of molecular assays for NoVs, have provided valuable information on NoV epidemiology in the last two decades, but information on the diversity of NoVs in the 1980s and 1990s is more limited. Uninterrupted surveillance for AGE in hospitalized children has been carried out in Palermo, Italy, since the mid-1980s, providing a unique collection spanning more than 35 consecutive years that can be used as a time machine to investigate retrospectively the genetic evolution of enteric viruses. In this study, we generated epidemiological and sequence data on the NoV strains circulating in the local paediatric population of Palermo since the end of the last century.
2. Material and Methods
During 35 consecutive years, from 1986 to 2020, uninterrupted NoV surveillance was conducted in Palermo, South of Italy. A total of 8433 stool samples were collected from paediatric patients (<5 years old) hospitalized with AGE at the ‘‘G. Di Cristina’’ Children’s Hospital. AGE was defined by at least 3 watery stools with or without bouts of vomiting in 24 h and of less than 7 days of duration, with no identifiable symptoms other than those related to infective gastroenteritis. Stool samples were collected within 12 h after admission to the hospital to avoid inclusion of nosocomial cases and stored at -20 or -80°C until processing. Viral RNA from stool samples collected from 1986 to 2000, was extracted using ELITE InGenius automated extraction platform (ELITechGroup, Inc., Bothell, WA). For samples collected from 2001 to 2020, viral RNA was extracted from 140μl of a 10% stool suspension using a QIAamp Viral RNA Mini Kit (QIAGEN, Hilden, Germany), according to the manufacturer’s instructions. Random hexamers were used for reverse transcription reaction to obtain complementary DNA (cDNA), by MMLV reverse transcriptase (Invitrogen, Carlsbad, CA). A quantitative reverse transcription (RT)-PCR assay (qRT-PCR), able to differentiate between GI and GII NoV-positive samples, was used to detect NoV RNA [
19]. NoV-positive specimens were genotyped with a multi-target strategy, generating sequence data on the diagnostic region A (spanning the ORF1 region coding for the polymerase) and region C (encompassing the initial part of the ORF2 and coding for the capsid), using primers JV12/JV13 and COG2F/G2SKR, respectively [
31,
32,
33,
34,
35]. A selection of 40 samples representative of different NoV GII.4 variants, observed during the study period, were also tested in the hypervariable capsid P2 domain using primer EVP2F and EVP2R, as previously described [
36].
Sequence alignment was performed with CLUSTAL W [
37]. Phylogenetic analysis was carried out using the MEGAX software [
38], using the Kimura 2-parameter model as a method of substitution and the phylogenetic trees of partial sequences of Pol and Cap were constructed using the Maximum-likelihood method with 1,000 bootstrap replicates. Genotype assignment was performed using the Noronet automated genotyping tool (
https://www.rivm.nl/en/noronet/databases) and the CDC calicivirus typing tool (
https://norovirus.ng.philab.cdc.gov/).
4. Discussion
After the first identification by electron microscopy of NoVs as cause of AGE in symptomatic children in 1970 [
2], NoVs have long been underestimated, until the development and adoption of specific molecular assays for routine diagnostics in the 1990s. In parallel, the literature on NoV has grown significantly after the year 2000, with an average of 30 manuscripts per year, versus less than 4 manuscripts per year in the second half of the 1990s (
https://pubmed.ncbi.nlm.nih.gov/?term=norovirus, searched on 01/01/2023). However, information on the epidemiology of the NoVs circulating before the 2000s is limited and fragmentary [
39,
41,
42].
In this archival retrospective study, NoV molecular epidemiology was investigated over 35 consecutive years, from 1986 to 2020. This archive of stool specimens and/or genetic material extracted from faecal samples derives from one of the longest enteric virus surveillances conducted in the European continent, providing an essential tool to investigate the evolution of NoVs.
Over the study period, NoV infection was detected in 17% of paediatric patients (<5 years old) hospitalized in Palermo, Italy. GII NoV, first detected in Palermo in 1989, represented the prevalent genogroup, accounting for 15.6% of paediatric gastroenteritis and reaching the highest rate (30%) in 2006 (
Figure 1). GI NoVs were first detected in Palermo in 1994 and they were found occasionally and scattered over the remaining study period, but in 2002-2004 and 2011 when they were responsible for 3.2-7% of AGEs. The absence of NoV in the very first years of the surveillance activity in Sicily, from 1986 to 1988, could be ascribed to the low number of samples tested in 1986 and 1987, but in 1988 the considerable number of 182 faecal samples were tested negative for NoV. Although climate variations may affect NoV seasonal circulation and long-term storage of samples can affect the stability of nucleic acids, our results could simply reflect the local viral epidemiology of that period, suggesting the introduction of NoVs in Palermo only at the very end of the 1980s and their limited circulation until the mid-1990s. Alternatively, mutations in the primers/probe binding sites could have hindered the detection of earliest NoV strains with the molecular assays used in this study.
In order to investigate the genetic variability of GII NoVs over time, Cap (ORF2) sequence analysis was performed, unveiling a high genotype diversity until 1994, followed by the predominance of GII.4 genotype from 1995 to 2020, with sporadic peaks of activity of GII.3 and GII.2 genotypes in 2003-2004 and 2016, respectively (
Figure 2a). The persisting epidemiological relevance of GII.4 genotype in Palermo was characterized by a fast rate of evolution, due to accumulation of punctate mutations within the protruding (P) domain of the capsid (10
-3 nt substitutions/site/year), coupled with intra- and inter-genotype recombination at the ORF1–ORF2 overlap in more recent years, starting from 2012 with the Sydney strain. These mechanisms have been proposed for the effective selection of strains with improved fitness and with the ability to evade the immune response [
20,
43]. As previously observed worldwide, in Palermo nine pandemic variants of GII.4 NoVs (Lordsdale, US95/96, Farmington Hills 2002, Hunter 2004, Yerseke 2006a, Den Haag 2006b, Apeldoorn 2007, New Orleans 2009 and Sydney 2012) emerged consecutively [
5,
8,
22,
44,
45], completely replacing each other over the study period every two-three years (
Figure 3A). The first NoV detected in this study (in 1989) was a GII.4 with a Cap gene genetically related to Lordsdale genotype (99.8% nt identity), identified in the UK in 1993 (X86557) [
46]. Lindesmith et al. hypothesized that pre-1995 Camberwell-like strains typically produced low-level endemic diseases in human populations, whereas since the mid-1990s the accumulation of point mutations has promoted the spread of post-1996 Lordsdale/Grimsby strains [
23]. However, the limited availability of NoV sequences from the 1980s, makes it difficult to date back the emergence of such an ancient genotype [
47]. Recombination events were rarely detected in the older Italian strains, which usually carried their canonical GII.P4 polymerase, with the exception of a single strain GII.4_US95/96[GII.P2] detected in 1994. By converse, in the last decade sequential recombination events repeatedly affected the variant GII.4 Sydney 2012. As already reported, this variant actually emerged in Italy in 2011 as a pre-epidemic strain with the original GII.P31 polymerase, anticipating the Australian and global circulation [
19,
40].
Thereafter, the local circulation of the Sydney variant was sustained by the acquisition of a GII.P4 New Orleans 2009 polymerase in 2013 and a GII.P16 polymerase in 2017 [
35,
48,
49,
50]. The sequential acquisition of such Pol genes may have been the key to the success of the Sydney variant and boosted its global emergence and spread.
The protruding P2 domain of the Cap protein possesses the epitopes involved in binding to the host cell, responsible for virus antigenicity [
21,
51]. The P2 domain was sequenced to better understand the evolution of GII.4 NoV strains over time. The aa alignment of 22 GII.4 Italian NoV strains selected over the study period showed punctate mutations accumulating over the time and associated with the sequential emergence of GII.4 variants every two-three years. In particular, a conserved aa insertion at position 394, in Epitope D (amino acids 393-395), which is mostly a threonine residue, was observed in all GII.4 strains since the emergence of the Farmington Hills GII.4 variant in 2002 [
52]. A change at residue 395 has been shown to alter GII.4 NoV antigenic profile [
23]. Crystal structure of the putative Epitope D has shown its strategic position on the surface of the capsid, since this epitope interacts with the histo-blood group antigen (HBGA) binding site, suggesting the role of such mutations in both receptor switching and escape from herd immunity [
53,
54]. It was previously described that the older GII.4 variants (i.e., Camberwell, Bristol, Lordsdale and US95/96) bound strongly only to antigen H of HBGA while the new GII.4 variants extended their capability to bind also A and B antigens [
54]. With respect to the ancestral strain GII.4 Camberwell 1987, no aa changes were observed in the epitopes A and E among the older Italian GII.4 strains, while, since the detection of GII.4 US95/96 variant, several aa changes in epitopes B, C and D were observed. Interestingly, the majority of aa substitutions were accumulated since 2002 with the emerging strain Farmington Hills and the mutation H395T represented the key shift in the antigenic milieu of the GII.4 NoVs. Since 2006, additional amino acid mutations were observed also in Epitope A, located on the surface ridge of the capsid and probably involved in the evolution and adaptation of the novel GII.4 variants [
55]. The direct role of the escape phenotype of epitope A was further demonstrated by the DenHaag 2006b variant which carried amino acid changes at positions 294, 296-298, 368 and 372 [
9].
Over the 35 years of surveillance, GII.3 NoVs represented the second most relevant genotype detected in Palermo, as also reported in other epidemiological studies elsewhere [
56,
57]. By phylogenetic analyses of Cap gene, four different clusters (I-IV) have been described among GII.3 strains [
39]. Clusters I and II contained the oldest GII.3 strains, detected in the 1970s, 1980s and 1990s, while clusters III and IV included the strains circulating since the 2000s. In Palermo GII.3 strains belonging to the four clusters described in literature were detected over the study period, with clusters III and IV temporally overlapping from 2012 to 2016 and an exclusive circulation of cluster IV thereafter. Italian GII.3 strains circulating from 1994 to 1997 in Palermo exhibited a P3 Pol gene. GII.3[P3] strains emerged globally in the 1980s and 1990s [
58]. After 5 years of apparent absence of circulation, from 1998 to 2002, since 2003 a succession of recombination events affecting GII.3 strains were detected in Palermo, with the acquisition of P21, P12, and P16 Pol genes. The GII.3[P21] Cap/Pol combination represented one of the most successful GII.3 variants, being associated with symptomatic infections in children worldwide from 2000 to 2009 [
59,
60,
61]. The increased mutation rate observed in the recombinant GII.3[P21] strains probably improved viral fitness [
62]. As observed in Palermo, the progressive substitution of strains belonging to different Cap clusters and the acquisition of Pol genes being the issue of recombination events possibly allowed the persistent detection of GII.3 strains. Recombinant GII.3[P21] strains were detected from 2003 to 2006 and then from 2013 to 2016 and in 2018, while GII.3[P12] strains, circulated in 2012 and 2019, and GII.3[P16] strains in 2011-2013. The latter strains were closely related to NoVs detected in Parma (Italy) and Bangladesh in the same period [
63,
64].
GII.2 NoV represented the third most relevant genotype detected over the study period, showing different Cap lineages and Cap/Pol combinations (
Figure 3c). In particular, the strain GII.2[P2] circulated from 1990 to 1994 and again in 2011, whilst recombinant strains with polymerase GII.P34, GII.P4_2006b, and GII.P16 appeared in 1996, 2009 and 2016, respectively. GII.2 NoVs usually account for <1-1.5% of infections globally, with sporadic peaks of circulation [
65,
66,
67,
68,
69]. On analysis of the Cap gene, the GII.2[P2] Italian strains collected in 2011-2016 were closely related to the GII.2[P16] Nashville strain (KY865307), supposed to be the donor of the polymerase for recombinant GII.4[P16] viruses [
70,
71]. Starting from 2011 the circulation of GII.2 genotype in Palermo was sustained by a variety of strains combining two different Cap lineages and Cap/Pol combinations.