1. Introduction
The Covid-19 pandemic, caused by the SARS-CoV-2 virus, highlighted the critical need for robust epidemiological tools to monitor and mitigate the spread of infectious diseases. One of the most significant challenges in mitigating the pandemic has been the emergence and spread of viral variants, which can have altered transmissibility, virulence, and vaccine efficacy [
1,
2]. Genome sequencing has emerged as a pivotal technology in tracking the temporal diversity of SARS-CoV-2 variants, providing invaluable insights into the evolutionary dynamics of the virus and informing public health responses.
Genome sequencing allows for the characterization of the detailed genetic makeup of SARS-CoV-2, enabling the identification of mutations and the classification of viral lineages. This technology has been instrumental in detecting variants of concern (VOCs) and variants of interest (VOIs), such as Alpha (B.1.1.7), Delta (B.1.617.2), and Omicron (B.1.1.529), which have had significant impacts on the trajectory of the pandemic [
3,
4]. By analyzing the viral genome, researchers can track the emergence of these variants and their spread across different regions and time periods, providing a real-time map of viral evolution [
5,
6,
7].
Wastewater-based epidemiology (WBE) has emerged as a complementary approach to clinical testing, offering a non-invasive and cost-effective means to monitor community-level infections. This method involves the collection and analysis of wastewater samples to detect the presence of SARS-CoV-2 genetic material, allowing for the early detection of outbreaks and the assessment of variant prevalence in the population [
8,
9]. The integration of genome sequencing with WBE has proven particularly powerful, enabling the detection of low-frequency variants and providing a comprehensive view of viral diversity in a community [
6,
10,
11,
12].
Oxford Nanopore Technology (ONT) has been widely adopted for sequencing SARS-CoV-2 due to its portability, rapid turnaround time, and ability to generate long reads, which are beneficial for detecting complex variants and reconstructing viral genomes from mixed samples [
9,
13]. Studies employing ONT for wastewater sequencing have successfully identified and tracked the temporal diversity of SARS-CoV-2 variants, demonstrating its utility in public health surveillance and outbreak management [
9,
14,
15,
16].
Genome sequencing offers a more comprehensive approach to identifying COVID-19 variants compared to PCR testing, which was the pioneering tool developed to detect Covid-19 in wastewater early in the pandemic [
8,
17]. While PCR tests are effective in detecting the presence of the virus, they do not provide detailed information about the genetic sequence, which is crucial for identifying and tracking specific variants [
18]. PCR-based identification of Covid-19 variants is also limited to the detection of specific, known mutations, whereas genome sequencing deciphers the entire genetic code of the virus, allowing researchers to detect new and emerging mutations and understand how the virus is evolving [
6,
16,
19,
20].
Few studies have compared the usefulness and accuracy of the two methods side-by-side [
21], and the majority that have were using clinical nasal swabs [
4,
22,
23], or a combination of clinical and wastewater samples [
24,
25]. In a review of 80 studies on Covid-19 variant determination, only two compared sequencing and PCR variant detection in wastewater [
21,
26,
27]. This study aims to compare the application of genome sequencing in tracking the temporal diversity of Covid-19 variants with a commercially available quantitative polymerase chain reaction wastewater variant kit. The study occurred during a period of transition between various Omicron subvariants: BA.1, BA.2, BA.4, BA.5, XBB, and BQ.1. Additionally, the study evaluates the effectiveness of these combined methodologies in offering early warning signs for public health interventions and in understanding the geographical spread and persistence of different variants. This integrated approach is particularly relevant for informing targeted public health strategies, especially in rural areas with limited clinical sequencing capabilities. The insights gained from this study will contribute to the optimization of wastewater-based epidemiology as a valuable tool in managing current and future pandemics.
2. Materials and Methods
2.1. Study Location
Samples were gathered weekly, beginning June 2021, from 16 sites across Michigan’s Eastern Upper Peninsula (EUP). The EUP includes Alger, Chippewa, Luce, Mackinac, and Schoolcraft Counties, totaling about 70,000 residents over 5,566 square miles, with an average density of 11.3 people per square mile. Only 13 of the sites were included in this study: two did not sample during the winter due to prohibitive ice and snow cover; and one did not have any samples during the study period, January 9, 2023 - April 27, 2023, that met the minimum criteria for genome sequencing (discussed later).
Figure 1 shows the sampling site locations included in this study.
2.2. Wastewater Sampling
Wastewater grab samples (250 mL) were collected from wastewater influent streams once per week. Samples were refrigerated or kept on ice until processed (up to 48 hours).
2.3. Viral Concentration
Each sample of raw sewer water (100 mL) was combined with 8% (w/vol) molecular grade polyethylene glycol (PEG) 8000 (Fisher Scientific) and 0.2 M NaCl (w/v) (Fisher Scientific). After mixing for two hours at 230 rpm and 4°C, samples were centrifuged at 4200 x g for 45 minutes at 4°C. The supernatant was removed using a sterile serological pipet, and the pellet was resuspended in the residual liquid.
2.4. RNA Extraction
Viral ribonucleic acid (RNA) was extracted from concentrated samples using the Qiagen QiAmp Viral RNA Minikit following the manufacturer’s custom protocol for the QIACube Connect (Qiagen, Germany). RNA extraction resulted in a final elution volume of 80 µL. Extracted RNA was used immediately for viral detection and quantification or stored at -80℃ for later use.
2.5. Initial Virus Detection and Quantification
Bio-Rad’s One-step RT-ddPCR Advanced kit was used with the Bio-Rad Automated Droplet Generator and the QX200 ddPCR system to quantify N1, N2, and Phi6 RNA (Bio-Rad, USA). Each reaction contained a final concentration of 1x Supermix (Bio-Rad, USA), 20 U/µl reverse transcriptase (RT) (Bio-Rad, USA), 15 nM DTT (Bio-Rad, USA), 900 nmol of each primer (BioSearch Tech), 250 nmol of each probe (BioSearch Tech), 1 µL of nuclease-free water, and 5.5 µL of template RNA. The final reaction volume was 22 µL. Quality control samples on each plate included a non-template control, extraction control, and processing blank. Samples, controls, and blanks were analyzed in triplicate.
Droplets were generated in the Bio-Rad Automated Droplet Generator (ADG) by combining 20 µL of reaction volume with 70 µL of droplet generator oil (Bio-Rad, USA), resulting in a reaction mixture-oil emulsion of 40 µL containing up to 20,000 droplets. The droplets were transferred, via the ADG, to a 96-well PCR plate that was then heat-sealed with foil and put in a Bio-Rad C1000 deep-well thermal cycler for PCR amplification under the following conditions: 25℃ for 3 minutes, 50℃ for 60 minutes, 95℃ for 10 minutes, 40 cycles of 95℃ for 30 seconds and 55℃ for 1 minute, 98℃ for 10 minutes, and hold at 4℃. After thermal cycling, the plate was transferred to the Bio-Rad QX200 Droplet Reader for concentration determination via spectrophotometric detection of fluorescent probe signal in gene-target positive droplets. Amplitude thresholding was performed manually for each analysis using the QuantaSoft (BioRad) software. Lower limit of detection, N1, N2, and Phi6 gene copies for each sample were then determined using the QuantaSoft output.
2.6. Variant Determination using ddPCR
Samples that were N1 or N2-positive in the initial detection were then tested for variant detection using the BA.1 (A67V; del69-70 mutations) and BA.2 (R408S mutation) discrimination assay kit (GT Molecular). Each reaction contained 5.5 µL Supermix (Bio-Rad, USA), 2.2 µL reverse transcriptase (RT) (Bio-Rad, USA), 1.1 µL DTT (Bio-Rad, USA), 1 µL GT primer-probe solution (GT Molecular), 6.7 µL of nuclease-free water, and 5.5 µL of template RNA for a total reaction volume of 22 µL. Droplet generation was performed in the same manner as previously described. Thermal cycling conditions were as follows: 50℃ for 60 minutes, 95℃ for 10 minutes, 45 cycles of 94℃ for 30 seconds and 60℃ for 1 minute, 98℃ for 10 minutes, and hold at 4℃ for 30 minutes. Concentration and target gene copies were determined in the same manner as described above for N1/N2 determination.
2.7. Variant Determination using Genome Sequencing
Previously-extracted RNA was used for genome sequencing. Sequencing was performed, retrospectively, several months after ddPCR (digital droplet polymerase chain reaction) variant detection. Only samples with an N1 and N2 combined concentration of ≥9000 gene copies (GC) per 100 mL were sequenced [
28]. Forty-three samples during the study period (January 9, 2023 - April 27, 2023) met this criteria for sequencing.
Reverse transcription was performed using the Midnight RT PCR Expansion kit (EXP-MRT001, Oxford Nanopore Technologies). An input volume of 8 µL of sample RNA was mixed with 2 µL LunaScript RT Supermix, then thermal cycled in the following conditions: 25℃ for 2 minutes, 55℃ for 10 minutes, 95℃ for 1 minute, and hold at 4℃. Midnight Primer pools A and B were then mixed according to manufacturer protocol and aliquoted 10 µL each into a clean 96-well plate. To each primer pool, 2.5 µL of RT reaction was added. Thermal cycling was performed under the following conditions: 98℃ for 30 seconds, 35 cycles of 98℃ for 30 seconds, 61℃ for 2 minutes, and 65℃ for 3 minutes, and hold at 4℃.
Addition of barcodes was performed according to manufacturer protocol using the Rapid Barcoding kit (SQK-RBK110.96, Oxford Nanopore Technologies). For each sample, 2.5 µL from primer pools A and B were combined with 2.5 µL or nuclease-free water in a clean 96-well plate. Then, 2.5 µL of Rapid Barcode were added to the corresponding sample wells. The barcoded plate was incubated at 39℃ for 2 minutes, then 88℃ for 2 minutes. Barcoded samples were then pooled in a clean Eppendorf DNA LoBind tube. Half of the pooled sample was used and half was stored at 4℃ in case needed for reloading during the sequencing run. To the pooled, barcoded sample, an equal volume of AMPure XP Beads were added, then mixed on a rotator mixer at room temperature for five minutes. After mixing, the tube was put on a magnet and the beads were washed twice with 1 mL of 80% ethanol. Residual ethanol was discarded and the pellet was resuspended with 15 µL of elution buffer and incubated at room temperature for 10 minutes. Up to 800 ng of DNA library was transferred to a clean tube, and combined with 1 µL of Rapid Adaptor F.
The prepared DNA library was combined with 37.5 µL Sequencing Buffer and 25.5 µL Loading Beads and loaded into a MinION R9.4.1 flow cell which was placed onto a MinION Mk1C sequencer (Oxford Nanopore Technologies) for 24-72 hours using the MinKNOW software. Data was analyzed using EPI2ME’s (Oxford Nanopore Technologies) FastqQC+ARTIC+NextClade workflow with the ARTIC nCoV-2019 protocol.
3. Results and Discussion
During the study period, N1+N2 gene copies fluctuated between 1163 and 2.8 million, with a mean of 32,127, and a median of 3607. Of the 210 samples tested from the 13 included sites, 43 met the minimum criteria of 9000 N1+N2 combined gene copies.
Figure 2 shows the normalized N1+N2 gene copies by site over the study period.
All of the samples sequenced contained genetic markers from the Omicron family of subvariants. Of the 43 samples compared, 39.5% of samples had matching results between the GT Molecular PCR kits and ONT sequencing. Of these, 4% were an exact match, and 33.5% were an “assumed” match, meaning that if the GT kit gave positive results for both BA.1 and BA.2 it was possible that BA.4 or BA.5 were present, based on shared mutations among the four subvariants [
29,
30]. Seven percent of the samples did not register a positive result in the GT Molecular kit (<LOD) but were assigned a clade (a group of similar viruses based on genetics) using sequencing. Two of these samples were identified as BA.2 and one as recombinant. The remaining 53.5% of samples were not matched with the two methods utilized. The majority of non-matches were assigned as “recombinant” using sequencing (35%). Others were BA.3, XBB, and BQ.1, for which there existed no markers in the GT Molecular kit being used (see
Table 1).
The period of January through April 2023 saw several Omicron subvariants circulating in the Midwest region of the United States, including BA.2, BA.4, BA.5, XBB, and BQ.1, with the dominant subvariant being XBB [
31]. Although BA.4/5 variant PCR kits were available at the time of initial analysis, BA.1/2 kits were still being used while transitioning to genome sequencing. Furthermore, the frequency of BA.2, BA.2.12.1, BA.4, and BA.5, all BA.2 relatives, fluctuated during this transition period, making the BA.2 kits still relevant [
30]. The paired sequencing and PCR data illustrate the temporal limitations of using PCR variant kits alone to determine current circulating Covid-19 variants. While BA.2 was a common subvariant in circulation in the study region during the study period, early instances of XBB (1/18/23) and BQ.1 (1/30/23) in the region were missed during initial PCR analysis (see
Table 2). These results reflect that even though BA.4/BA.5 PCR kits were available, incidence of newer Omicron subvariants like XBB and BQ.1 would still have been missed because the PCR kits were not capable of detecting those variants at that time.
Figure 3 shows the temporal occurrence of subvariants determined by PCR and ONT methods.
Although more sensitive to detecting key spike protein mutations [
16], PCR kits for detecting COVID-19 variants have a significant limitation: they rely on the specific genetic sequences of known variants. Consequently, these kits require that a variant is already in circulation before materials can be created to detect it. This limitation means that PCR kits might not be effective in identifying new variants immediately as they emerge, potentially delaying the detection and tracking of these new strains [
32,
33]. Furthermore, recent variants of concern contain more than 30 mutations in the spike protein, complicating the detection of these variants and enhancing their ability to evade detection by standard testing methods like ddPCR [
34]. In contrast, genome sequencing incorporates data from global sequencing efforts, often within days of new sequences being submitted [
35]. This allows for real-time tracking and analysis of SARS-CoV-2 variants, providing insights into the virus’s spread and evolution without the delay for development of new detection materials [
6,
16].
Another advantage of sequencing wastewater samples is that it can provide data at a population scale in places where sequencing clinical samples is limited by resources [
12,
36], especially when overall clinical testing has declined significantly [
37,
38]. During the study period, the Michigan department of Health and Human Services reports that only two clinical samples from the entire region were sequenced (S.S., personal communication, 6/18/2024). Given that there were at least five different subvariants circulating at the time, each with potentially differing transmission and virulence characteristics, sequencing only two samples would provide little information about the distribution pattern and evolution of the virus across an expansive geographic region like the Eastern Upper Peninsula of Michigan.
In summary, using PCR for the initial detection and quantification of COVID-19 virus particles in wastewater remains one of the most effective, time-efficient, and cost-efficient methods for monitoring the virus within a population [
39,
40]. While PCR-based variant detection kits are highly sensitive to spike protein mutations [
16], they depend on known genetic sequences, resulting in delays in identifying emerging variants [
32,
33]. In contrast, genome sequencing technologies like ONT offer early insights into new and emerging variants spreading within communities, surpassing the capabilities of variant-specific PCR tests [
16]. This study underscores the value of ONT sequencing of wastewater in providing real-time information about dominant circulating variants, equipping health officials with critical data for making targeted and effective public health decisions. Real-time data is particularly crucial in regions like the Upper Peninsula of Michigan, where limited clinical samples are sequenced. Both methods provide data about circulating variants, but ONT provides a more complete picture in rapidly evolving Covid-19 scenarios by detecting individual mutations, which allows for the identification of any current variant as well as emerging or yet-unknown variants or subvariants. This is especially helpful during transitions between highly related subvariants like the BA.2 / BA.4 / BA.5 family.
Author Contributions
Conceptualization, M.J., T.N., B.S., and D.W.; formal analysis, M.J..; writing—original draft preparation, M.J.; writing—review and editing, M.J., T.N., B.S., and D.W.; project administration, T.N., B.S., and D.W. All authors have read and agreed to the published version of the manuscript.