1. Introduction
Enterococci are Gram positive firmicutes comprising commensals and pathogens in a wide range of vertebrates, and plants [
1,
2], that are prevalent in animal gastrointestinal tracts [
1,
2,
3,
4], nature [
1,
5], and even food processing systems [
6]. Some species are found in diverse vertebrates, such as
Enterococcus faecalis, Enterococcus faecium, and
Enterococcus avium, which have been isolated from both mammals and aves [
1,
2]. In humans these species have been associated with infections of the urinary tract, blood stream, heart, and central nervous system [
7,
8]. Some Enterococcal species appear to be restricted to particular host species or classes [
2].
Enterococcus cecorum is primarily associated with avian species and has been isolated from the intestines or cloaca of galliforms and neoornithes [
1,
2,
9]. Comparative genomics based on the Enterococus core genome places
E. cecorum most closely related to
Enterococcus columbae (host birds) and within a clade associated with plant, aquatic and bird hosts [
2].
E. cecorum typically colonizes the intestines of chickens around 3 weeks of age [
10,
11,
12]. Pathogenic strains have been associated with osteomyelitis affecting the flexible thoracic vertebrae (kinky back) and the proximal femoral (femoral head necrosis) or tibial (tibial head necrosis) growth plates. Borst
et al., [
13] identified a cluster of twelve genes that appear to distinguish osteomyelitis isolates from commensals, likely to be involved in virulence through capsular modifications. Evolution of pathogenesis and drug resistance in
E. faecalis and
E. faecium, have been associated with mobile elements and acquisition of capsular gene clusters [
14,
15]. Phylogenomic analysis of more than 100 genomes of
E. cecorum isolates from French broilers suggested that a specific lineage of pathogenic isolates is infecting broilers in Europe and the United States [
16].
The recent occurrence of sepsis outbreaks caused by
E. cecorum is characterized by onset during the first few weeks post hatch, prior to the normal colonization of the intestinal tract [
17]. However, there are older reports of sepsis outbreaks with losses of 5-8%, characterized by pericarditis, bacteremia, and osteomyelitis [
18]. Mechanisms of
E. cecorum horizontal or vertical spread in flocks are not known, although particular facilities or flocks may experience repetitive outbreaks [
17,
18]. Recent outbreaks may be more problematic with the removal of early administration of antibiotic growth promoters in flocks [
17,
19,
20]. Questions we wished to address included: whether specific mutations, or gene acquisition are driving recent sepsis outbreaks, whether sepsis and osteomyelitis isolates are distinct lineages, and whether high resolution bacterial genomics can inform responses to sepsis outbreaks. Answers to these questions will be critical for developing treatments, vaccines, or management strategies for mitigating sepsis outbreaks.
3. Results and Discussion
Enterococcus cecorum Isolates from commercial operation samplings were obtained from poultry diagnostic laboratories representing samplings from 2020 through 2021 (
Table 1). These included environmental samples as well as necropsy samples, including internal organs. We also included two bacterial chondronecrosis with osteomyelitis (BCO) isolates from our own sampling of commercial broiler farms [
22], and our poultry research farm (UAPRF). Total DNA was purified from each isolate and submitted for Illumina 2x151 sequencing. Reads were trimmed and assembled as described in Materials and Methods. The assemblies (
Table 1) ranged from 32 to 105 contigs with an average of 62 contigs. Total assembled base pairs ranged from 2.15 to 2.49 Mbp with an average of 2.27. The N50 values ranged from 49.7 to 209.1 kbp with an average of 108.9. These 34 genomes along with 193
E. cecorum genomes deposited in NCBI were used to generate phylogenomic trees to discern the relationships and possible origins of the BCO and sepsis isolates. A major issue in any of these analyses is knowing the phenotypic trait of each isolate regarding virulence/pathogenicity. BioSample data from NCBI may not clearly identify host disease state or the actual anatomical source of the culture. For example, if an isolate was obtained from bone marrow should that be classified as a case of sepsis or BCO? If an isolate was obtained from hatchery egg waste from an infected breeder flock, the virulence potential of that isolate is unknown. We therefore made every effort to be strict in our interpretation (see legend to
Figure 1) as to whether an isolate was from BCO (osteomyelitis or bone marrow) or from sepsis (peritoneum, heart, liver, blood). For the purposes of our analyses, we reserved sepsis isolates to be those isolated from an internal organ or compartment while BCO isolates were from bone marrow or infected joint. We also defined a Chicken Disease group (145 genomes) as BCO isolates plus the strictly defined sepsis isolates which excludes isolates from air sac, egg waste, environmental swabs, intestines, cecum, and unspecified clinical samples. All genomes not in Chicken Disease were assigned to the None phenotypic group. Strain assignments to phenotypic groups are listed in
Table S1.
Phylogenomic trees were generated for all 227 genomes using either kmer genome comparisons using PopPUNK (
Figure 1) or shared core genome SNPs using ParSNP (
Figure S1). Although branch lengths were different for the two methods the overall topology was similar, and clustering very similar. Regardless of using a strict or relaxed definition of BCO or sepsis the phylogenomic analyses all support a polyphyletic origin for both the BCO and sepsis isolates. Red circles for BCO appear in all branches and the green triangles/circles for chicken sepsis appear in multiple locations. There is one large cluster (A) of US sepsis isolates that is related to a BCO isolate we obtained from tibial pus from a broiler in 2016 (1415). Within this cluster are two vertebral osteomyelitis isolates (LB5924 & LB5925). Cluster B contains LB5800, a sepsis isolate from heart, four closely related isolates from egg transfer residues (LB5857, LB 5916, LB5876), an isolate from a case of air-saacculitis (LB5880), and two distantly related isolates, a BCO femoral head necrosis isolate (1675) and an intestinal isolate (LB5923). Cluster C contains one sepsis isolate (1750) and two vertebral osteomyelitis isolates (LB5836 & LB5843). Cluster D contains no evidence of BCO or sepsis isolates as it consists of isolates from cull eggs, or egg transfer residue, along with one isolate (LB5922) with incomplete documentation of isolation source. The tree contains 100 genomes from isolates collected from 16 poultry farms in France from 2007 to 2017. Some of these genomes are from probable sepsis cases (red circles) with many representing pericarditis isolates [
16]. These isolates appear throughout the tree with clusters including BCO and probable sepsis virulence phenotypes. There are also six clinical isolates from humans (green plus nodes,
Figure 1 and
Figure S1) that are distributed amongst the chicken isolates. Of note, the phylogenomic analyses also included cloacal isolates from Harris Hawk (ESM13C1 & ESM13AIC1), and black vulture (E1062, ESM877AIC1), pig feces (WCA-380), and internal organs from duck (D-104), and goose (G-29). The Harris Hawk isolates appear to be the most distant from the other 225 isolates. While the goose G-29 isolate is within the branches leading to cluster D. The pig and one of the black vulture isolates bracket a solitary USA sepsis isolate, 1752, The duck internal organ/sepsis isolate D-104, isolated in Poland, places in a cluster with French sepsis and BCO isolates. We interpret these data as most consistent with pathogenic isolates arising from non-pathogenic isolates and that sepsis isolates are closely related to BCO isolates. Our conclusions contrast with those from previous analyses of smaller collections of genomes that suggested distinct lineages for commensal and pathogenic isolates of
E. cecorum [
13,
16,
37].
If osteomyelitis and/or sepsis are polyphyletic then this could be driven by gene acquisition and/or by mutation. Based on comparison of genomes of three BCO isolates with three cecal isolates, Borst
et al. [
13], proposed that a cluster of 12 genes possibly related to capsular modification distinguished the BCO isolates from the cecal isolates. Using the SA1 assembly of Borst as our reference genome, the Prokka annotation places the 12 gene cluster as protein encoded genes (PEG) 312 through 323. We used BLASTp to search the Prokka predicted polypeptides from all 227
E. cecorum genomes for these 12 polypeptides. BLASTp results were tabulated combining values for percent identity and query coverage (pid x qcov/100) and averaged across all genomes according to phenotypic group: None, Chicken Disease, BCO, Strict Sepsis (summarized in
Table 2 with complete results in
Table S2). The results show that the cluster of capsular genes are conserved in approximately 70-80% of the isolates classified into Chicken Disease and for those classified as BCO, the conservation may be higher in strict sepsis group, but the number of those isolates is significantly lower (34 vs 113). Additionally, presence of these capsular genes in the isolates within the None group (no disease or disease association not known) is around 30 to 40% of the isolates.
Unicycler assemblies for our 34 assemblies were surveyed for circular elements. This identified 3 potential plasmids in isolate 1754. We used BLASTn searches of all 227 genomes to determine how widespread these contigs are, as a signal for horizontal genetic exchange. Contig 25 (6237 bp) from 1745 appears to be more than 75% conserved in 24 assemblies: 1415, 1754, An144, ARS48, ARS60, ARS65, CHD182EN, CHD270EN, CHD278EN, CHD341EN, CHD347EN, CHD83EN, CHD96EN, CIRMBP-1218, CIRMBP-1233, CIRMBP-1259, CIRMBP-1276, CIRMBP-1282, CIRMBP-1284, CIRMBP-1285, CIRMBP-1294, PS10, PS2, and SA2. Contig 27 (4525 bp) is more than 75% conserved in 10 assemblies: 1415, 1754, An144, ARS60, CHD83EN, CHD96EN, CIRMBP-1234, CIRMBP-1238, CIRMBP-1273, and PS2. Contig 31 (2418 bp) is more than 75% conserved in only the 1754 genome; the closest related contig is in An144 and only 71% conserved. The isolates containing homologous contigs are scattered around the phylogenomic tree (
Figure 1), but sometimes found in highly similar genomes, indicative of both horizontal transfer or shared origins. Only some genomes containing contig 25 homologs contain contig 27 homologs (and vice versa). Analysis of contig 25 using Prokka annotation, and BLASTp of NCBI NonRedundant indicates six predicted CDS that annotate as four hypothetical proteins, a replication RepB initiation protein, and a plasmid recombination protein. Similarly, for contig 27 there are four annotated CDS: a hypothetical protein, a plasmid recombination protein, a CopG transcriptional regulator, and a replication RepB protein. Contig 31 annotates as three CDS: a hypothetical protein, a transcriptional regulator, and a replication RebB protein. Therefore, mobile plasmids appear to exist in
E. cecorum but do not seem to be widespread or the drivers of the emergence of sepsis. However, this is only based on identification of circular elements from the Unicycler assemblies in the 34 genomes we have generated.
Our Prokka-Roary analysis of 227 genomes suggests that the core+ pan genome for this species contains 14728 genes with 878 genes in the strict-core genome (99%≤strains≤100%), 335 genes in the soft-core genome (95%≤strains<99%), 1689 genes in the shell genome (15%≤strains<95%), and 11826 cloud genes (strains<15%). Prior analysis of 140 genomes had suggested a pangenome of 8523 gene clusters with 1207 genes in the strict-core genome, 4664 genes in the accessory genome, and a unique genome of 2,652 [
16]. Thus, our results appear to expand the pangenome by over 6000 genes with similar gene content for the sum of the strict-core and soft-core to the pervious numbers for a strict-core genome. Scoary analyses of the pan genome with respect to phenotypic traits (
Table S2) were employed to determine whether any particular loci, or gene clusters had high specificity for either Chicken Disease isolates or for Strict Sepsis. We also analyzed for Chicken Disease which compared BCO isolates, along with the sepsis isolates, to the isolates for which there is no disease phenotype known. Since many of the isolates for which there is no disease phenotype known may actually include pathogenic isolates, we were skeptical of any meaningful results for that particular comparison. The data for loci positively or negatively associated with Chicken Disease identified 195 genes with corrected P-values <0.05 (
Table S3). Of the 159 loci positively associated (Odds ratio≥5) with Chicken Disease trait, 63 annotated as something other than hypothetical protein. Of the 20 loci negatively associated (Odds ratio≤0.2) only 6 annotated as something other than hypothetical protein. For these 159 loci we identified 12 clusters of two or more loci based on position in the genomes for either of three BCO isolates. Seven of the locus clusters were present and clustered in all three isolate genomes examined (
Table S3). Functions associated with these clusters include glycosylation loci including the capsular genes identified by Borst [
13], carbohydrate/sugar transferase systems, probable transposable elements, and vitamin/biotin transferase systems. When we analyzed all the genomes for loci associated with the Strict Sepsis group, the data identified 101 loci with 97 positively associated and 4 negatively associated (
Table S4) However, of these loci 94 were apparently identified because they are found in primarily in the genomes present in cluster A in
Figure 1. To reduce the bias of Cluster A we reduced representation from that cluster by excluding some of the Cluster A isolates to generate the trait group Sepsis Strict Reduced (
Table S1). The Scoary output using Sepsis Strict Reduced indicated that there were no loci with Bonferonni corrected P-values <0.05 (
Table S5). Inspection of this data identifies 24 loci with sensitivity ≥50 (measure of frequency of presence in isolates with the trait), but only 10 of those loci had a specificity score ≥50 (measure of how specificity the gene is isolates with the trait). Of those 10 loci with higher specificity, five annotate as hypothetical protein and another a transposase. The other four loci encode functions in vitamin B12 import, ascorbate uptake, hexulose conversion, and a transcriptional regulator for glucosamine utilization. We interpret these Roary-Scoary results to indicate that there is no evidence of specific gene acquisitions that are shared by the majority of sepsis isolates. Therefore, either there are no specific gene acquisitions that can convert a BCO pathogen into a sepsis pathogen, or individual clades have specific independent gene sets that must be acquired for transition to a sepsis pathogen based on their particular pan genome composition.
To assess clade specific gene acquisition we used Scoary to analyze gene presence/absence for sepsis and BCO isolates in cluster 1 comprised of French isolates (trait CIRMBP Cluster 1 in
Table S1). This compares four sepsis isolates to 22 BCO isolates within this clade. The Scoary results (
Table S6) identified only seven genes, with six annotated as hypothetical proteins and one a transposase. None of these loci had significant corrected P-values for association but did show high specificity scores. Five of the loci are clustered in at least two of the sepsis isolate genomes and Phaster results show this region is likely a prophage (contig 1 from 109451-157474). Thus, our analyses failed to reveal compelling evidence for gene acquisition driving the emergence of sepsis.
For the transition from BCO pathogen to sepsis pathogen, an alternative to gene acquisition through horizontal transfer, is that the transition involves mutation of resident genes. Evidence from sequence analysis of ancient Marek’s Disease Virus strongly support that this virus has become more pathogenic not by gene acquisition or gene rearrangement, but through point mutations [
38]. Emergence of sepsis through point mutations would also be consistent with the polyphyletic origins of sepsis isolates (
Figure 1). Existing pathogens or commensals could experience a few key mutations which would allow them to survive in blood and colonize organs.
We first used ParSNP to identify and determine core genome SNPs that are differentially represented in BCO vs sepsis isolates. The ParSNP analysis was limited to those genomes known to be from cases of chicken disease (scored as 1 in Chicken Disease;
Table S1). For this analysis we also employed the same reduced representation from Cluster A (exclude NA isolates as in Sepsis Strict Reduced;
Table S1). This compared 17 sepsis isolates to 113 BCO isolates using the finished/complete genome BCO isolate SA1 as the reference. Since the reference genome was a BCO isolate, only SNPs over-represented in sepsis should represent mutations favoring sepsis. The core genome SNPs (n=78150) were filtered for those present in >94% of Sepsis Strict Reduced isolates (n=967) and then for frequencies ≥ 0.30 in the sepsis isolates relative to the BCO isolates. This identified 34 SNPs; 32 bi-allelic and 2 tri-allelic (
Table S7). These 34 potentially diagnostic SNPs occur in only six genes. Eight SNPs are in first or second codon base positions and most likely to be missense while 26 are in the wobble position and more likely to be silent. Interestingly, four of the genes (Smc, YaaA, hypothetical protein, and AldC) appear to be clustered, as they represent annotated PEG 229, 230, 231 and 234.
We analyzed the same reduced representation Chicken Disease genomes using Maast as an alternative to ParSNP because the two programs use different alignment algorithms to genotype SNPs and are known to produce different results [
35]. The reference genome was again BCO isolate SA1. The maximum SNP genotype frequencies for Sepsis genomes from Maast were 0.54 (
Table S8); considerably lower than the 1.00 SNP frequencies identified by ParSNP (
Table S7). The Maast genotype data were filtered for SNPs >50% in Sepsis (n=252) and then frequency difference (sepsis - BCO) ≥ 0.20; resulting in a total of 77 SNPs (74 biallelic and 3 triallelic); 7 intergenic, and 70 in coding sequences for 51 protein encoding genes (
Table S8). Fifteen of the genes that flank or span the SNPs annotated as hypothetical proteins, but none of these hypothetical genes is PEG 231 identified in the ParSNP analyses. Of the four clustered genes from the ParSNP data, only the Smc gene was identified also in the Maast analyses. Both ParSNP and Maast identified SNPs in PTS system fructose-specific EIIABC component. Potential clusters of SNPs identified in the Maast analyses include those affecting PEG 346 to 373, 570 to 577, 629 to 637, 744 to 753. There is also the potential for clustering of genes affected for seven SNPs affecting four genes from PEG 594 to 603.
We extracted and aligned the encoded polypeptides for the six genes from ParSNP and 12 additional genes from Maast to identify all variant polypeptide positions (
Table S9). The analyses included predicted polypeptides from all chicken disease isolates and included those from Cluster A that had been excluded from the initial ParSNP and Maast analyses. Each variant position was scored for frequency in the Sepsis Strict group of isolates relative to the BCO isolates. This identified 114 polypeptide variants where Sepsis percentage was greater than 20 points higher than the percentage in BCO (green highlight in
Table S9). The highest frequency protein variations were the R555Q and A590V substitutions in PTS system fructose-specific EIIABC which were found in 75% of sepsis isolates but only in 33 and 29%, respectively, of BCO isolates (
Table S9). All 114 polypeptide variants were also scored for frequency in the Sepsis Strict Reduced (to reduce influence from the higher number of sepsis isolates from Cluster A). Of the 114 protein variants there were 15 that were ≥ 50% in Sepsis Strict Reduced in 7 genes (yellow highlight in
Table S9). Those 15 SNPs are summarized in
Table 3 and are candidates to be key mutations driving the adaptation from BCO to sepsis.
Annotation of these seven genes and the literature support possible roles in virulence of
E. cecorum. Hypothetical protein PEG231 was identified in the ParSNP analysis (
Table S7). BLASTp analysis at uniport.org identifies this as a serine aminopeptidase S33 domain-containing protein. Hip1 in
Mycobacterium tuberculosis is a S33 serine aminopeptidase that inhibits macrophage and dendritic cell functions [
39]. PTS (phosphotransferase transport system) fructose-specific EIIABC has been found to regulate virulence expresssion and stress response in
Lysteria monocytogenes [
40], and biofilm and endocarditis in
Enterococcus faecium [
41]. CBCL1 orthologs in
Pseudomonas aeruginosa are involved in production of quorum sensing signals [
42]. BLASTp at uniprot.org identifies ElaA as a GNAT family N-acetyltransferase, a family of proteins involved in bacterial adaptation to diverse habitats [
43]. EttA regulates protein synthesis in energy-depleted cells and is critical for bacterial survival in long-term stationary phase [
44]. RpoN encodes a sigma factor, σ
54, for directing RNA polymerase to alternative promoters. Orthologs have been shown to regulate virulence determinants including pili, flagella, and type III secretion systems (reviewed in [
45]). YheH is known to function in the signaling pathway for sporulation in
Bacillus subtilis [
46]. Therefore, the genes identified have to do with environmental response and gene regulation. Further, there are at least an additional 39 genes from the Maast analysis that we have yet to assess whether they have coding variations differentially present in sepsis vs BCO isolate genomes (
Table S8). There are also intergenic region SNPs that could affect regulation of flanking genes.
Future work should be focused on obtaining additional isolates from sepsis outbreaks to determine how much genomic diversity is present in each outbreak, and to assess whether the same genotypes are present in successive outbreaks in a facility. Previous work has failed to identify vertical transmission of
E. cecorum using standard culture methods [10; 18]. More sensitive DNA-based methods need to be applied to breeder hens, hatcheries, embryos, and broiler houses [
47]. Virulence testing of isolates is also problematic. Others have relied on chicken embryo lethality assays for evaluation of
E.cecorum isolates [
17,
48,
49], but the direct relevance to pathogenicity post hatch has been a concern [
50]. Experimental infections of young chicks with isolates of
E. cecorum have shown some promise [
10] but have not been used to evaluate relative virulence of different isolates. Some insects have been used for virulence assays of human isolates of other Enterococcus species (reviewed in [
51]). Alternative models could be used to discern whether sepsis isolates and BCO isolates have measurable differences in pathogenesis. Further, many of the genomes in the national databases are poorly documented for specific clinical source and host disease state, thus the need for expanding the available genomes from well documented cases.