3.2.1. Structural ORFs
BIS08 Structural ORFs had mosaic arrangements with different proteins having homology with different phages. Majority proteins are close to Lederbergvirus BTP1. This module can be divided into 3 main categories
1) tail spike and tail fiber proteins (ORF 5 and 20, color coded blue,
Figure 2) that are distributed at start and end of structural module. Tail spike protein (1986 nt, and 662 AA, ORF 5) has maximum homology with only lederbergvirus BTP1 (97.63 % similarity with 100 % querry cover) for all other phages only 27 % nt sequence has 98 % homology rest did not match. This protein serves as adhesin to attach to the surface O- residues on
Salmonella cell surface [
25], [
26]. Same was true for aminoacid sequence of ORF 5 that has maximum homology with BTP1 (96.07%) whereas all others have ≤ 70.9% homology. Second tail fiber protein (ORF 20, 405 nt, 135 AA) has maximum homology with
Salmonella phage S18, SE16 and epsilon 34 (97.76% nt whereas 96% AA homology in BLASTp).
2) Capsid morphogenesis and stabilization proteins (ORFs 6-17). They include four ORFs (ORF 6, 7, 8 and 9) involved in DNA transfer to the host cell. These proteins are released from phage particle upon infection. This module also includes a head decoration/ assembly protein (ORF 10, 152 AA), A tail needle protein (ORF 11, 234 AA), a head closure Hc3 like protein (ORF 12, 473 AA), Head tail adaptor Ad3 like protein (ORF13, 167AA), A hypothetical protein (ORF 14, 187AA), Major capsid protein (ORF 15, 431 AA), and a head scffolding protein (ORF 16 , 304 AA). All these proteins had high similarity with Lederbergvirus BTP1 except for ORF 13, head-tail adaptor protein that had no homology with BTP1 and was 100 % identical to
Enterobacteria phage ST104.
3) Third category includes proteins involved in DNA transport in to head such as a portal protein (ORF17, 726 AA) and DNA packaging macheniery Terminase large (ORF 18 487 AA), and small subunit (ORF 19, 153 AA) colored turquoise blue (
Figure 2). Portal protein is conserved and have maximum homology with BTP1 like other structural proteins. However the most peculiar feature of BIS08 is presence of a unique terminase enzyme. Phage terminase large subunit (ORF 487AA, 1461 bp) of BIS08 had only one homolog in protein and nucleotide sequence that is
Salmonella phage ST-35 (Accession No CP051279.1). All others memebers of Lederbergvirus genus had only 10 % querry cover and 33 % similarity index with BIS08 large terminase. Same was true for phage small terminase as it exhibits 66 % amino acid homology to three other members of unclassified Lederbergvirus genus and only one nucleotide homolog (100 %)
Salmonella phage ST-35.
3.2.4. Hypothetical and Characterized Proteins of BIS08
In total BIS08 encodes 72 ORFs, out of which 26 ORFs are hypothetical proteins (roughly 36 %) with no known functions (ORFs color coded ash-grey in genome map, Fig 2). Six proteins had no BLASTn homology and were present only in BIS08 (ORF 23, 24, 49, 50, 55, and 62, Fig 2). The other 20 ORFs had varying degree of similarity to phages from unclassified Lederbergvirus. An equal number of BIS08 proteins (26 ORFs) has BLAST homology with known proteins in database. These proteins are color coded Red in
Figure 2. They include LPS modification enzymes for phage binding such as Bactoprenol-linked glucose transferase (ORF3, Fig 2) and an Acetyl transferase protein, involved in modification of surface O antigen (ORF 4, Fig 2, color coded red). Both proteins are present adjacent to tail spike protein (ORF 5, Fig 2) and have high homology with
Salmonella phage MG40. An enzyme involved in fatty acid synthesis and degradation (KilA-N domain containing protein, ORF 27) is present adjacent to lysis cassette (Fig 2).
The genome also possesses a “nin region” like lambda phages between antitermination protein Q (ORF 32) and phage DNA replication protein O (ORF 44). This region includes 11 ORFs , half of which do not have any BLAST homology with known proteins and are listed as hypothetical proteins (ORF 35, 39-42). The proteins in the other half are related to lambda NinR region proteins [Ninz (ORF 33), NinG (ORF 34) and NinF (ORF 36), NinE (ORF37 and NinB (ORF 38)]. The function of majority “nin region” proteins are not known however NinB (orf) and NinG (rap) are known to influence lambda phage recombination when replication stops. These genes are present in epsilon 34 and other related phages of unclassified Lederbergvirus genus. The same region also has a highly conserved DNA helicase (ORF 43). Phage ORFs associated with control of lysogenic and lytic switching are present on right hand side of the genome. This region includes an anti-termination protein N (ORF 52) that activates lytic switch by suppressing transcriptional terminator activity and allowing production of phage proteins. This protein has high homology to Phage epsilon 34. Adjacent to this is ORF 53 that encodes HNH endonuclease having the highest similarity with Salmonella phage P6. This region also has a putative recombinase (ORF 56) and a P-loop containing ATpase that can catalyze diverse cellular functions such as DNA repair, signal transduction, membrane transport etc. This ORF has high similarity with Escherichia phage APC_ JM 3.2 (96 %). A putative ssDNA binding protein (ORF 58) with high similarity to Escherichia phage HK620 and a protein involved in repression of RecBCD pathway (ORF 59) with highest similarity to Salmonella phage SE1 are also present.
Although no virulence genes were identified in BIS08, it possesses putative
eae gene (ORF 61) that has high homology with
Salmonella phage PM43 only. This eae gene encodes effacement and attachment factor in Enteropathogenic E coli strains. This gene was smaller in size in BIS08 than E coli and Escherichia coli Phage vB_EcoS Sa179lw
eae gene.
Escherichia coli Phage vB_EcoS Sa179lw is one of the temperate phages whose presence in
E. coli is implicated in pathogenicity by[
27] . BIS08
eae gene has less homology (78 % query cover and 28 % similarity) to
Escherichia coli Phage vB_EcoS Sa179lw. This gene may help bacterial host colonization [
28]. In addition, BIS08 genome also possesses
ea22 (ORF 64) and
eaF (Orf 64) genes, both these genes are thought to promote lysogeny and regulation of lytic to lysogenic switch. BIS08 also carries
eaA gene (ORF 68) whose product presumably allows the invasion of host restriction modification system.
3.2.5. BIS08 Terminase Analysis
BIS08 has a terminase enzyme (ORF 18 and 19) whose large and small subunit does not show any nucleotide homology with any member in Lederbergvirus genus and unclassified Lederbergvirus Division (un-LD) apart from ST-35.
A protein BLAST search of the amino acid sequence of terminase large subunit (TLs) indicated that only 10 percent amino acid sequence exhibit 33 % similarity with any member of Lederbergvirus genus and un-LD, however its amino acid sequence is identical to ST-35 TLs but did not appear in BLAST search as ST-35 is not annotated, only FASTA is submitted in NCBI. Same was true for the nucleotide sequence of TLs that was identical to ST-35 and had no other similarity in entire genus (3% QC and 99 % similarity). When compared in class Caudoviricetes using BLASTn rather than only Lederbergvirus genus it produced similar results. The terminase nucleotide sequence was present in several prophages whose sequence had no homology with BIS08 or ST-35, however from type of proteins these prophages possess they appear to be podo-viruses.
Nucleotide sequence of Terminase small subunit (TSs) had no similarity in Lederbergvirus genus, however when compared to Caudoviricetes class in general nucleotide BLAST it showed 99 % similarity with another phage SEN8 (Acc. No KM202159.1) from family Peduovirinae in addition to BIS08 and ST-35. The amino acid sequence of BIS08 TSs has similarity ranging from 66-36% with four other phages from un-LD (Acc. No URC09186.1, BEI45454.1, ABQ88402.1, URC09779.1, AZF93010.1, listed in order of decreasing similarity). To estimate terminase enzyme diversity in Lederbergvirus genus and un-LD we compared the TLs and TSs sequence from twenty selected accessions having varying degree of nucleotide similarity with BIS08 (
Table S5) in VIRIDIC software. A maximum likelihood (MLA) phylogenetic tree using GDBP method was constructed using VICTOR single gene phylogeny (
Figure 3A and B). A VIRIDIC heat map is given in
Supplementary Figure S3. It was found that the genus has four distinct clades of TLs. Clade 3 (P22 like terminase) was most abundant (60 %). All close homologs of BIS08 possess this type of TLs. Clade 4 is found in 25 % of accessions (
Figure 3A) whereas Sf6 and BIS08 make distinct clades (Clade 1 and 2). This analysis placed ST-35 and BIS05 as a separate lineage. Same was true for TSs phylogenetic analysis (
Figure 3B). We performed a detailed
in silico analysis of Protein domain prediction and structural modeling of BIS08 TLs and TSs to see its differences from other terminases in the genus.
3.2.5.1 Protein Modeling of BIS08 Terminase Small Subunit (TSs)
An initial protein BLAST of the BIS08 Terminase small subunit (TSs) amino acid sequence failed to identify a similar or previously published structure in the RCSB. To elucidate the apparently unique structure the online modelling tool Alphafold2 [
29] was used first to gauge the secondary structure nature of the monomeric form and what higher order quaternary structure did the subunits make. Initial attempts to model the entire sequence as a monomer produced exceptionally poor scoring in the n-terminal and c-terminal domains (residues 1-14 and 126-151 respectively) and were omitted from the final monomer. This allowed for improved pLDDT values from 85.7 to 92.9 and a pTM score of 0.656. The output models superimposed well as per the scoring. The TSs sequence utilized to model the protein is shown in
Figure 4A as is the secondary structure assignment and the monomer is shown
Figure 4B. The n-terminal domain is formed by three short stacked alpha helices and a single long helix of 29 amino acids. The short stacked helical bundle forms a tight hydrophobic core at its center, maintained in part by
leu25, Leu35, Phe38, Phe39, ile46, Phe56, Leu57 and
Ile60. The central connecting domain is comprised of a long anti-parallel beta sheet loop (residues 81-106) followed by another alpha helix made in large part by the predominantly hydrophobic residues between
T109-
TAAIFWLKN-
R119. A DALI structure search (Holm 2006) revealed some identity of our protein with the crystal structures of a bacteriophage Sf6 terminase small subunit from
Lederbergvirus Sf6 (PDB codes 4dyq-B and 3hef-B; R.M.S.D 6.2 - 6.3 Å, [
30], [
31]). The superimposed structures are depicted in
Figure 4C with most of the structural identity stemming from the short stacked helical bundle or N-terminal DNA-binding domains of the proteins.
To decipher the higher order quaternary structure of the protein Alphafold2 [
32] was again used, this time using monomer numbers of six, eight and nine. Again, sequence length was modulated based on score and the ability of the program to complete the task successfully. Overall and much like the quaternary structure of the Sf6 terminase, a ring-like octameric assembly was necessary to make a complete ring for the BIS08 TSs (utilizing residues 15-151) and produced high confidence scoring for the top ranked complex (pLDDT=84.4, pTM=0.845 and ipTM=0.841;
Figure 4D, lateral view, 4E top view at 90°). The pictured octamer is a highly interweaved structure, especially with respect to the c-terminal domain (residues 127-151) made up of two long β-sheets forming the neck of the protein and resembling and extended Beta-barrel. The main central body (residues 1-82) is formed by the aforementioned, N-terminal DNA binding domain and is capped by the long anti-parallel beta sheet loop (residues 81-106). The structure reveals a maximum diameter of approximately 90 Å and a height of 102 Å, while at the apex of the structure the diameter measures 23 Å and the base 48 Å.
3.2.5.2. Protein Modeling of BIS08 Terminase Large Subunit (TLSs)
A BLASTP search of the BIS08 Terminase Large subunit (TLS) produced no published proteins of significant similarity. Alphafold2 [
32] analysis was subsequently utilized to elucidate the structure (pLDDT=91.7 pTM=0.621) which depicted two distinct protein domains (residues 1-275 and 276-486), with each subsequently reassessed independently in AlphaFold as well as being subjected to Dali homology searches online [
33].
The N-terminal domain of the TLs protein had exceptional AlphaFold scoring (pLDDT=95.5, pTM=0.935) and subsequent DALI analyses (
Table 1), described the protein as a Terminase that binds ATP/purine ribonucleoside triphosphate’s (GO:0005524/ GO:0035639). Like other members of this enzyme family, the BIS08 TLs AlphaFold structure possesses a central set of six parallel β-sheets which is flanked on one side by an additional set of two small anti-parallel β-sheets (residues 132-147) which are all sandwiched on either side by several α-helices (
Figure 5A and 5B). Atop the enzyme are two additional alpha helices forming what is referred to as the Lid subdomain (residues 237-269) that sits atop the ATPase active site [
34], a common feature for this enzyme group (
Figure 5C). The top four enzymes produced exceptional R.M.S.D overlap with the modeled protein, however there were distinct differences in some secondary structure elements including any connective elongated loops, but in particular with the Lid domain of 3CPE (Sun 2008). One crystal structure (PDB code 4IEE) has a clearly defined ATP binding domain due to the presence of a non-hydrolysable analog Adenosine Triphosphate-γ-S (AGS) [
35]. Its presence in part reveals several common features in ATP binding among the Terminases including a pyrophosphate binding domain present on the most central alpha helix and loop region of our model (residues 76-
SGHGIGKSA-84). Other residues within four angstroms of terminal phosphate include Asn212 and Glu184. The adenosine moiety of ATP π-stacks with Trp30 and His179 of 4IEE. Given the probable loop movement in this region of BIS08, several candidates are possible for a similar interaction, including Tryptophan’s (30, 88 and 123) or Phe38 and Met243. Overall residues Asn212, Glu184, Gln45, Gly81, Lys82, Gly79, His78, Gly77 and Glu84, remain identical or mostly identical amongst the structures with the exemption of His78 (for Arginine) (
Supplementary Figure S4).
A BLASTP search of the c-terminal domain of the BIS08 TLs protein (residues 276-469) was far more successful, showing homology with a number of Gene 2 proteins from
Lederbergvirus Sf6 (PDB code 5c10; [
40]) and
Shigella phage Sf6 (PDB code 4IDH; [
35]) possessing 33 % identity with our protein. Alphafold2 [
32] analysis was subsequently utilized to elucidate the structure with exceptional scoring for the top ranked structure (pLDDT=95.9 pTM=0.927) and in some cases producing a sub two angstrom R.M.S.D identity with other published Terminase proteins following Dali homology searches (
Table 2; [
36]). Characteristic of this protein class, the structure consists of a central five stranded beta-sheet sandwiched between alpha helices ([
40] ;
Figure 6a and 6b) with residues 151-176 forming an extraneous Beta hairpin and alpha helical arrangement. Within this well characterized active site we observe three acidic Asp residues (181, 86 and 33 ) in TLs and a single catalytic Lys166, a residue that mediates metal cofactor Mg2+ binding in Terminase gp2 [
40]. Amongst the homologous structures discovered in DALI only the
Pseudomonas phage terminase protein (PDB code 8KDR-B; Dong 2023) and the
Lederbergvirus Sf6 protein (PDB code 4IEE; [
35]) possessed identically positioned active site residues.