1. Introduction
Precision metagenomics is an approach that customizes the analysis of a metagenomic sample to detect and classify a specific pathogen. Developing culture-independent methods for detecting foodborne pathogens can expedite source tracking and reduce the time to implement corrective measures during suspected case scenarios [
1,
2,
3,
4,
5,
6]. For qualitative analysis of foodborne pathogens, qPCR or metagenomic detection is sufficient to call presumptive positive results, which are then followed by microbiological isolate confirmation and potentially subsequent regulatory actions [
7] (
https://www.fda.gov/media/102633/download, https://
www.fda.gov/media/83177/download). A similar scenario applies to
Cronobacter [
5,
6,
8].
Cronobacter in powdered infant formula (PIF) has been associated with infant illnesses and human clinical cases [
8,
9,
10].
Cronobacter was previously classified as one single species, Enterobacter sakazakii. Since 2008, it has been reclassified as a genus with seven identified species [
11,
12]. Among these species,
Cronobacter sakazakii belonging to ST4 is responsible for over 90% of infant illnesses [
13]. Potential sources of PIF contamination with
Cronobacter spp. during commercial manufacturing include ingredients added after pasteurization, and equipment contamination post-pasteurization, such as spray dryers and fillers [
14]. Current
Cronobacter spp. detection protocols as outlined by the FDA Bacteriological Analytical Manual (BAM) Chapter 29 [
15] employs enrichment in buffered peptone water (BPW), qPCR screening, chromogenic agars for isolation and cultural confirmation by qPCR or biochemical assays. The FDA BAM method does not identify individual species of
Cronobacter. Single colony isolation is used for whole genome sequencing (WGS), which confirms the genus identity and determines the species identity and relationship to other
Cronobacter strains in the WGS database. This entire process can take approximately two weeks of analysis time.
To expedite the analysis time, we aim to investigate Culture-Independent Diagnostics Tests (CIDT) using sequencing methods for the detection and classification of
Cronobacter strains directly from PIF enrichments. This can be accomplished by using short or long-read sequencing. Bertrand et al. (2019) [
16] outlined the challenges associated with short-read sequencing, particularly in accurately assembling complex, highly repetitive genomic regions, especially in multi-species environments. They noted that clustering-based species binning lacks precision for strain-level metagenomic assemblies crucial for outbreak investigations. In contrast, Oxford Nanopore Technologies (ONT) (Oxford, United Kingdom) long-read sequencing provides closed genomes while also providing an affordable and portable platform [
17].
The limits of assembly of the long-read nanopore sequencing for use as a precision metagenomics technique from enrichments was established previously as requiring a minimum of 107 CFU/ml of overnight enrichment [
18]. The main difference between the E. coli precision metagenomic approach from enriched agricultural water and the
Cronobacter from PIF is that E. coli levels in enriched agricultural samples can vary greatly, requiring a certain threshold for successful ONT sequencing to recover a complete genome for further characterization. In the case of
Cronobacter in PIF we have previously demonstrated that the levels are usually between 107 and 109 CFU/ml in overnight enrichments with more than 90% of the enrichment sample being
Cronobacter (our internal study). During the enrichment process,
Cronobacter spp. concentration increases to high levels and can be easily quantified by qPCR, according to the BAM Chapter 29 protocol [
15]. Therefore, the levels are high enough in the enrichment for the sample to be considered almost as a pure cultured strain.
Despite the benefits of nanopore sequencing, including affordability, portability, long reads, and real-time basecalling, the inherent error rate of nanopore sequencing (when using the fast-calling model with the R9.4.1 flow cell and related library preparation kits) precludes the closed genomes from being used for phylogenic analysis. However, in late 2022 ONT released the R10.4.1 flow cell and the rapid barcoding Kit 24 V14 (RBK-114.24), which, together with a newer basecalling model, showed an increase in accuracy (~Q40) [
19,
20] (
https://rrwick.github.io/2023/12/18/ont-only-accuracy-update.html). Additionally, the ONT protocols have improved to allow successful sequence processing of up to 24 samples (genomes of median ~4.5 Mb) in a single flow cell and still obtain a complete closed bacterial genome with high accuracy (>99.9%) and minimal SNP differences from the reference genome, depending on the species (internal study data not published) [
20,
21,
22]. Consequently, in this pilot study, we aim to test whether we can characterize and close the genomes of up to ten
Cronobacter spp. strains that were spiked into PIF and enriched overnight in a single ONT sequencing run.
2. Materials and Methods
Bacterial strains. The bacterial strains used in this study belonged to five different
Cronobacter species (
Table 1). Bacterial strains were cultured in BD Difco™ brain heart infusion (BHI) broth (Thermo Fisher Scientific, Waltham, MA) and stored in BHI broth with 20% glycerol stocks at -80⁰C.
Artificial contamination and sample processing. The powdered infant formula (PIF) samples were obtained locally and previously tested negative for
Cronobacter. For artificial contamination, 25 g of PIF samples were inoculated with ~5 CFU of one strain of
Cronobacter and mixed with 225 mL of buffered peptone water (BPW) (Thermo Fisher Scientific), followed by incubation at 36°C for 24 hours. An aliquot of each enrichment was processed as described in Chapter 29 of the BAM as outlined in
Figure 1. An additional aliquot of 1 ml was taken from the enrichment and processed as described below for Nanopore sequencing.
DNA extraction and
Cronobacter qPCR detection following the BAM procedure. Briefly, after BPW enrichment, 40 mL of enrichment cultures were centrifuged at 3000 × g for 10 min. The resultant pellet was washed and resuspended in PrepMan Ultra and boiled for 5 min. The supernatant was used for qPCR in BAM Chapter 29, which targets the
Cronobacter partial macromolecular synthesis operon: the ribosomal protein S21 (rpsU) gene 3'end and the DNA primase (dnaG) gene 5' end [
23]. Briefly, two µl of the DNA extract was added to 23 µl master mix containing 0.4 µM
Cronobacter forward and reverse primers, 0.15 µM of Internal amplification control (IAC) forward and reverse primers, 0.3 µM of
Cronobacter probe, 0.15 µM of IAC probe, 1X iQ qPCR Supermix (BIORAD, Hercules, CA), additional 2.5 U of Taq polymerase and 3 mM of MgCl2 and 50 nM of ROX passive dye. All primers and probes (
Supplementary Table S1) employed in this study were purchased from IDT (Coralville, IA, USA).
DNA extraction. Genomic DNA from each spiked sample was extracted using the Maxwell RSC Cultured Cells DNA kit with a Maxwell RSC Instrument (Promega Corporation, Madison, WI) according to manufacturer’s instructions for Gram-negative bacteria with additional RNase treatment. DNA concentration was determined by Qubit 4 Fluorometer (Invitrogen, Carlsbad, CA) according to manufacturer’s instructions.
Whole genome sequencing and assembly. DNA recovered from PIF spiked enrichment samples were sequenced using a GridION nanopore sequencer (Oxford Nanopore Technologies, Oxford, UK). The sequencing libraries were prepared using the Rapid Barcoding Kit 24 V14 (SQK-RBK114.24) and run in FLO-MIN114 (R10.4.1) flow cells, according to the manufacturer’s instructions for 48 hours. There were only two deviations from the original SQK-RBK114.24 library protocol: 1) the starting DNA per samples was 200 ng, and 2) the barcoding was done using 18 µl of sample DNA plus 2 µl of each barcode. The run was live base called using Guppy v 7.1.4 included in the MinKNOW v 23.07.12 software using the super-accurate basecalling model. The initial classification of the reads for each run was done using the “What's in my pot” (WIMP) workflow contained in the EPI2ME cloud service. That workflow allows for taxonomic classification of the reads generated by the GridION sequencing in real time.
The genomes for each spiked sample were obtained by de novo assembly using all nanopore data output per sample using the Flye program v2.9 [
24], using the following parameters: --nano-hq, --read-error 0.03, and -i 4. The assembled contigs were classified by taxonomy by Kraken 2 [
25] using GalaxyTrakr [
26].
in silico MLST and serotyping. The initial analysis and identification of the strains were performed using an in silico
Cronobacter spp. MLST approach based on the information available at the MLST website (
https://pubmlst.org/organisms/cronobacter-spp). The in silico
Cronobacter serotype present in each sample was determined using by batch screening in Ridom SeqSphere+ v9.0.8 (Ridom, Münster, Germany) using the genes described in Wang et al, 2021 [
27] and available as part of the CroTrait pipeline (
https://github.com/happywlu/CroTrait).
Phylogenetic relationship of the strains by whole genome multilocus typing (wgMLST) analysis. The phylogenetic relationship of the strains was assessed by a whole genome multilocus sequence typing (wgMLST) analysis using Ridom SeqSphere+ v9.0.8. The genome of C. zakazakii ATCC BAA-894 (NC_009778.1) containing 4103 CDSs was used as a reference. The complete closed genomes of these C. zakazakii strains (NC_017933.1 - ES15,
NC_020260.1 - SP291, NZ_CP011047.1 - ATCC 29544, and NZ_CP012253.1 - NCTC 8155) were used for comparison with the reference genome to establish a list of core and accessory genes. Genes that are repeated in more than one copy in any of the two genomes were removed from the analysis as failed genes. A task template then was created that contains both core and accessory genes for this reference Cronobacter sakazakii strain for any future testing (wgMLST scheme). Each individual locus from strain BAA-894 was assigned allele number 1. The assemblies for each individual Cronobacter closed genome in this study were queried against the task template and if the locus was found and was different from the reference genome or any other queried genome already in the database, a new number was assigned to that locus and so on. After eliminating any loci that were missing from the genome of any strain used in our analyses, we performed the wgMLST analysis. These remaining loci were considered the core genome shared by the analyzed strains. We used Nei’s DNA distance method [38] for calculating the matrix of genetic distance, taking into consideration only the number of same/different alleles in the core genes. The resultant genetic distances were used to generate a Neighbor-Joining (NJ) tree. wgMLST uses the alleles number of each locus for determining the genetic distance and builds the phylogenetic tree. The use of allele numbers reduces the influence of recombination in the dataset studied and allows for fast clustering determination of genomes.
Nucleotide sequence accession numbers. The raw ONT data for each individual sample were deposited in GenBank under BioProject accession number PRJNA1117853.
3. Results
Cronobacter grew in every enrichment culture for all PIF spiked samples. The uninoculated PIF sample continued to test negative following enrichment.
Cronobacter concentrations in each enrichment varied from 107 to 109 CFU/mL (
Table 2). The BAM qPCR method detected the presence of
Cronobacter in each overnight spiked PIF enrichment. A schematic representation of the entire workflow as described in materials and methods is shown in
Figure 1. All spiked PIF samples resulted in positive qPCR signal at Cq values of 12.8 to 18.7 (
Table 2). The enumeration results and qPCR Cq had an excellent correlation with R2 of 0.93.
The DNA extracted from 1 ml of each of the 10 spiked PIF overnight enrichment samples were sequenced using ONT for 48 hours. The sequencing output for the run was 17.16 Gb in 3.25 million reads, of which 94% (3 M) passed the quality filter (minimum quality score of 10). Around 600 k reads were unclassified and were discarded. The read length N50 was 8 kb and there was enough data generated for each of the samples at different coverage levels to perform a successful genome assembly and sample characterization (
Table 3). Reads below 4000 bp and quality scores below 10 were discarded for downstream analysis resulting in 915,281 remaining reads. The estimated coverage per
Cronobacter sample (genome size approximately 4.5 Mb) ranged from 55 – 420 X.
The preliminary
Cronobacter species ID and the number of total reads matching each species per spiked PIF sample were determined using Oxford Nanopore EPI2ME “What’s in my pot” (WIMP) workflow analysis (
Table 2). Only one species was misclassified by WIMP as C. universalis (EB18) instead of the expected C. turicensis. According to the WIMP analysis 90% to 99.6% of the classified reads per spiked PIF sample belonged to the spiked strain (
Table 2). These numbers confirmed what was observed by qPCR for the same samples. According to these values, the concentration of each individual spiked strain ranged from 0.8 x 107 to 0.8 x 109 CFU/ml.
Since the observed
Cronobacter spp. concentrations (
Table 2) in spiked PIF overnight enriched sample were similar to the concentrations observed for pure overnight cultures, we proceeded with the assembly using the reads obtained for each sample as we usually do for a pure culture [
28,
29]. The complete closed genome for each strain was de novo assembled using Flye in our high-performance computing environment (
Table 4). The assembly of the reads for each sample resulted in completely closed circular chromosomes and/or plasmids. The G+C mol% of these assemblies were 56.8% to 58.1%, which is within the range of the reported GC content for
Cronobacter spp. strains [
11]. The genome of some samples (E515 and E603) was composed only of a chromosome without the presence of any extra chromosomal elements. The remainder of the samples carried between 2 to 3 plasmids. Analysis of every contig for each sample with Kraken2 showed that all of them belonged to Cronobacter spp. The chromosome coverage for each sample varied from 65 to 516 X.
in silico MLST analysis (
https://pubmlst.org/organisms/cronobacter-spp) showed that most of these strains belonged to eight known sequence types (STs) (ST1, 4, 8, 15, 19, 54, 80, and 81), with E791 belonging to a novel ST profile (151, 20, 111, 90, 195, 212, 239) (
Table 5). Interestingly, the closest STs to E791 reported in the Cronobacter MLST database matched in 4 loci and they all belonged to the species C. dublinensis (ST477, 479,576, and 662).
The phylogenetic relationship among the 37
Cronobacter spp. genomes (27 completely closed genome representative for each of the seven
Cronobacter species publicly available at GenBank -
Supplementary Table S1, and the 10 ONT assemblies from the spiked overnight enriched PIF samples) was determined by a custom wgMLST analysis (
Figure 2). A total of 3,965 genes were used as templates for the analysis of the
Cronobacter spp. strains. Every species was easily distinguished and the wgMLST confirmed their current species definition. wgMLST analysis grouped the 10 PIF spiked samples according to their species and distinguished differences between strains from the same ST (e.g., ST8, differing in at least 34 alleles). This phylogenetic tree also confirmed that sample EB18 was indeed a
C. turicensis.
4. Discussion
Considering the importance of powdered infant formula (PIF) safety, accurate detection and classification of Cronobacter spp. in PIF are crucial, particularly during outbreaks. Current methods include qPCR and extensive selective plating before whole genome sequencing (WGS) analysis. This time-consuming process requires nearly two weeks for isolate confirmation. The use of qPCR as a screening tool of the pre-enrichment broth and long-read sequencing analysis of qPCR positive samples, allows detection and full characterization of a Cronobacter strain in 3 to 4 days. Although confirmation by microbiological methods remains the standard, this new method can provide faster information to inform risk management decisions, potentially reducing the risk of illnesses by up to a week.
Since levels of
Cronobacter, especially C. sakazakii, in overnight enrichments can reach 107 to 109 CFU/ml and
Cronobacter makes up 90.0 – 99.6% of the culture (
Table 3), samples from overnight enrichments can be treated as pure isolates and sequenced directly without the need of further selective enrichment or plating steps to isolate the
Cronobacter strain from the other microbiota present in the overnight enrichment. Simple ONT sequencing and assembly using all reads has demonstrated in this limited study as adequate to completely close
Cronobacter genomes. The data generated from each sample is manageable (compressed fastq files from 0.3 Gb to 1.8 Gb) and can produce a usable genome in about 0.5 to 2 hours post-run in a high-performance environment. This reduces the overall analysis time compared to traditional culture methods and WGS, allowing detection of related strains in a few days (3 days from enrichment to usable data), compared to approximately 2 weeks (
Figure 1).
Given the high relative populations of
Cronobacter in each enrichment (90.0-99.6%) (
Table 3), short-read sequencing could satisfactorily identify and characterize these strains, but a complete genome would not be recovered. Short reads would capture most of the genome, but some fragments and repetitive regions might be missed. In contrast, ONT allows for closing the complete genome (chromosome and/or plasmids) (
Table 4). ONT also enables positive identification of most
Cronobacter species in the PIF sample within just 1 hour, except for C. turicensis. This highlights a problem with accurate taxa identification using curated databases (In this case RefSeq Database created on 8 June 2021). If the reference genome defining a species is missing in the database, proper identification may not be achieved. Regular updating of databases improves their accuracy and comprehensiveness. Without these regular updates, these databases might lack essential information, leading to incomplete or incorrect identification of microorganisms.
The observed STs in this study agreed with the expected results for the spiked taxa per PIF samples (
Table 1 and
Table 5). The
Cronobacter MLST database currently holds 4,150 strains, with varying numbers for different STs (e.g., 499 strains belong to ST1 (reported as C. sakazakii), 615 strains to ST4 (reported as C. sakazakii), 125 strains to ST8 (reported as
C. sakazakii), 1 strain to ST15 (reported as
C. sakazakii), 7 strains to ST19 (reported as
C. turicensis), 8 strains to ST54 (reported as
C. universalis), 7 strains to ST80 (reported as
C. dublinensis), and 13 strains ST81 (reported as
C. muytjensii). In silico serotype analysis also matched expected results, showing high diversity of serotypes among the samples and within the same species (e.g., two different O types for
C. sakazakii strains). ONT rapidly differentiated them using in silico MLST, which is crucial since most
Cronobacter sakazakii causing more than 90% of infant illnesses belong to ST4 [
13], including recent cases linked to PIF and breast pump equipment caused by ST4 strains [
6].
It is important to use qPCR for early screening and quantification to determine the approximate concentration of
Cronobacter in PIF enriched samples, which helps estimate the number of samples per flow cell for ONT sequencing. A previous study of ONT sequencing for Culture-Independent Diagnostics Tests (CIDT) of STECs in agricultural water overnight enrichments determined that a minimum of 107 CFU/ml of STEC in a complex mixture was needed for successful genome closure using a single flow cell [
18]. This study showed that when the target organism reached concentrations above 107 CFU/ml and 90-99.6% of the sample, up to 10
Cronobacter genomes could be successfully closed with high coverage (100 – 300X). This suggests that more samples could be included, reducing the cost per sample. At current prices, the entire run costs around
$860 (flow cell –
$700, Sequencing kit library –
$110, and DNA extraction –
$50), with a cost of 86 USD per sample. Bulk purchasing of flow cells and running 24 samples instead of 10 could reduce the price per sample to around
$25. The
Cronobacter ONT method from PIF enriched samples has an advantage over STEC enrichments, as a positive PIF sample usually results in a single organism, reducing sample complexity [
18] .
The pilot study aimed to test nanopore sequencing for preliminary identification and characterization (MLST, serotyping, and wgMLST) within approximately 3 days from sample collection, comparing results to reference genomes (
Table 1 and
Figure 2). Sequencing with the new R10.4.1 flow cell showed higher accuracy than R9.4.1 as observed by many others [
17,
18,
30]. Strains with previous sequences in NCBI clustered tightly together, with varied numbers of loci differences (
Supplementary Table S1 and
Figure 2). For instance, sample E477 (ST8) differed in at least 3 loci from ATCC 29544 (ST1). However, issues such as indels and missing genes highlight the need for cautious interpretation and robust validation of ONT data compared to other sequencing methods like Illumina.
Currently, WGS in bacterial-related diseases focuses on sequencing microorganisms isolated from selective culture plates [
6,
31,
32,
33]. Clinical and diagnostic labs are transitioning to Culture Independent Diagnostic Tests (CIDTs) for rapid identification, but CIDTs do not yield physical isolates, hindering outbreak investigations by public health agencies. This in turn prolongs outbreak resolution [
34]. Similarly, isolating foodborne bacteria for surveillance or case investigations, such as
Cronobacter, takes around 1 week per isolate before WGS can be performed. There's a push for rapid, cost-effective, and highly accurate culture-independent methods for both clinical and food investigations [
2,
18,
34,
35]. Previous metagenomic approaches yielded closed STEC genomes but weren't suitable for SNP-based source tracking [
18]. This study using newer ONT chemistry (R10.4.1) for
Cronobacter identification and subtyping directly from overnight PIF enrichments shows higher quality than earlier results obtained for STEC using the (R9.4.1 flow cells). We accurately identified each spiked strain (MLST and serotype) and subtyped nearly to the strain level (wgMLST). Similar results have been observed by other authors for certain strains of
Listeria monocytogenes and STEC [
21] and Staphylococcus [
22], showing the improvement of ONT sequencing for outbreak investigations. However, protocols for nanopore sequencing, DNA extraction, and library preparation still vary, and validation for repeatability, reproducibility, and robustness is needed before implementation.