2.2. Viral RNA extraction and sequencing
For samples from Örebro, sequencing was performed on three different laboratories: department of Laboratory Medicine, Örebro University Hospital, National Pandemic centre (NPC) or Public Health Agency in Sweden (PHAS).
In Örebro the sequences were generated using either a MiSeq (Illumina, U.S) instrument with the ARTIC V3 tiled amplicon enrichment protocol (400 bp amplicon) or with a GridION (Oxford Nanopore Technologies, United Kingdom) instrument using the Midnight protocol based on the ARTIC network (1 200 bp amplicons). Both described in detail by Koskela von Sydow et al.[
39] Consensus sequences from Illumina data were generated with either Ridom SeqSphere+ version 8.3.1 – 8.5.1[
40], or gms-artic version v2.0[
41], Ridom Seqsphere+ was used with the following settings: samtools mpileup (-A -d 1000000 -B -Q 0, v1.12); ivar consensus (-q 20 -t 0.7 -m 10); ivar variants (-q 20 -m 10 -t 0.1). Gms-artic was used with the default settings and “--scheme nCoV-2019”, “–schemeVersion” “V3” or “V4.1”. Consensus sequences for the 1200 bp amplicons were generated with wf-artic version v0.3.9-v0.3.24[
42] from epi2me-labs using the default settings and --scheme_”version V1200” or “Midnight-IDT/V1”.
For sequencing at the NPC, sequences were generated on the MGI DNBSEQ-G400 using an ultraplex amplicon approach and PE100 library construction[
43]. In brief, raw sequencing data were filtered using FastP[
44], followed by mapping towards the reference genome. Alignments were trimmed from primer sequence by use of iVar[
45], and Variant Call Files were calculated using Freebayes[
46]. Consensus sequences were generated based on major frequency bases. Low coverage areas (<30) were masked to N, and deletions (as defined by variant call) were masked, and then the sequence was collapsed at the point of deletion to keep relative genome coordinates.
At the PHAS, the sequences were generated by Genome Sequencer HiSeq (Illumina, U.S), sequence mode NovaSeq 6000 S4 PE150 XP. High quality reads were aligned to the SARS-CoV-2 reference genome (isolate Wuhan-Hu-1, MN908947) using the BWA-MEM v0.7.17-r1188 algorithm and consensus sequences were generated using consensusfixer v0.4[
47] with at least 15 supporting reads at each position. Base positions which showed less than 15x coverage were filled with N’s.
For samples from Uppsala, RNA extraction, reverse transcription PCR (RT-PCR) and Nanopore sequencing were performed at the Division of Clinical Microbiology and Hospital Hygiene, Uppsala University Hospital, Sweden. RNA extraction was performed according to the manufacturer’s instructions using Chemagic™ 360 (PerkinElmer, U.S.) or eMAG® (bioMérieux, France) instruments. Samples positive for SARS-CoV-2 were detected with an in-house RT-qPCR method or with the BIOFIRE® Respiratory 2.1 plus Panel (bioMérieux, France). After extraction, RNA eluates were stored at -20℃. The Ct value for each sample, needed for correct dilution of the RNA, was acquired from the in-house RT-qPCR method.
Between the start of the study period and June 27, 2022, the Midnight protocol[
48] was used alongside the NEBNext® ARTIC SARS-CoV-2 Companion Kit protocol version 1.0_1/21 (with a few modifications) for the library preparation and sequencing. After June 27, 2022, only the ARTIC protocol was used. The modifications to the protocol included 33 PCR cycles and the replacement of the PCR bead cleanup, by a 1:10 dilution of all PCR products as per the ARTIC nCoV-2019 v3 (LoCost) protocol[
49,
50]. Library preparation was performed with the NEBNext® ARTIC SARS-CoV-2 Companion, Native Barcoding Expansion 96 (Catalogue number: EXP-NBD196; Oxford Nanopore Technologies, United Kingdom) and Ligation Sequencing (Catalogue number: SQK-LSK109; Oxford Nanopore Technologies, United Kingdom) Kits. Between the start of the study period and June 27, 2022, ARTIC Network SARS-CoV-2 V3 primers were used. After this, VarSkip Short v2 primers (New England BioLabs, U.S.) and BA.2 Spike-in primers[
51] (New England BioLabs, U.S.) were used. This switch was done to increase the quality of sequences since issues with the older primer sets had previously been described.[
38,
52] The DNA concentration of the Library was measured using the Qubit HS dsDNA assay kit (Thermo Fisher, U.S.).
Sequencing was performed with the R9.4.1 flow cell (catalogue number: FLO-MIN106D) on a GridION instrument (Oxford Nanopore Technologies, United Kingdom). To reduce the risk of cross-contamination between runs, flow cells were never reused. The MinKNOW software was used with the following settings: high-accuracy base calling, barcodes on both ends, minimum barcoding alignment score = 60, minimum mid-read barcoding alignment score = 50, and trim barcodes.
Analysis of sequence data was performed with Geneious Prime version 2021.1.1[
53]. Primer sequences were trimmed using the Geneious prime BBDuk plugin version 38.84[
54] with the following settings: trim = left end, kmer length = 21, maximum substitutions = 3, trim low quality (<10) from both ends, discard reads shorter than 50 bp and custom BBDuk options; rcomp = f and restrictleft = 32. Sequence alignment and mapping to the SARS-CoV-2 Wuhan-Hu-1 reference sequence (NCBI accession number: NC_045512.2) was performed using the Geneious prime minimap2 version 2.17[
55] plugin using the following settings: data type = “Oxford Nanopore (more sensitive)”, include secondary alignments enabled, maximum secondary alignments per read = 5, minimum secondary to primary alignment score ratio = 0.8 and “remove existing trim regions from sequences” enabled. Consensus sequences were generated using the “Generate Consensus Sequence” function in Geneious Prime using the following settings: minimum coverage = 4 reads, minimum nucleotide frequency = 0.5 and “Trim to reference sequence” enabled.
Consensus sequences from both Örebro and Uppsala were uploaded in FASTA format to the Global Initiative on Sharing Avian Influenza Data (GISAID) database[
56].
The sequences were assigned Pango lineages according to the Pango dynamic lineage nomenclature scheme[
57] using the Geneious wrapper plugin for Pangolin[
58] that runs the Phylogenetic Assignment of Named Global Outbreak Lineages (Pangolin) tool[
59]. Unaliased Pango lineages used for grouping lineages were acquired using modified versions of R scripts contained in the PangoLineageTranslator tool[
60].
To identify the mutations in the sequences, Coronapp[
61] was used to find all mutations across the entire genome. Coronapp is annotation based, which we have found necessary to find all mutations in our sequences generated from Nanopore sequencing which occasionally have frameshifts.