1. Introduction
Bacteriophages are the most abundant biological entity on Earth with an estimated 10
31 particles being present on our planet. Among these, tailed bacteriophages with double-stranded DNA genomes are the dominantly observed and reported in literature [
1]. Until recently, tailed phages were classified within the Caudovirales order and described based on their morphology, possessing either a short (
Podoviridae) or long tail that is contractile (
Myoviridae) or non-contractile (
Siphoviridae). Given the extent of their genetic, host species and ecological diversity, the taxonomy of phages has recently been re-established to reflect these evolutionarily-relevant factors and to remove reliance on morphology-based assignments [
2]. The current viral class Caudoviricetes embraces 14 families assigned to one of four orders, while a further 33 families have been identified that have yet to be assigned to an order. Irrespective of the (sequence-based) taxonomy of phages, their morphology has a significant bearing on the types of interactions they will establish with their respective hosts [
3,
4].
Phages with long, non-contractile tails (formerly called the
Siphoviridae) are the most abundant in nature and several of these have become models to study phage-host interactions including the
Lactococcus phages TP901-1 and p2,
Streptococcus thermophilus phage STP1,
Bacillus subtilis phage SPP1 and
Escherichia coli phages lambda and T5 [
5,
6,
7,
8,
9,
10]. These phages are genetically distinct; however, in many cases, functional modules within their genomes are syntenic [
11]. Furthermore, despite sequence disparity between phages of the same or distinct bacterial host species, the structure of their components is often conserved [
4,
12,
13,
14,
15]. The temperate lactococcal P335 phage TP901-1 is particularly well characterized with respect to its 3-dimensional structure and the interactions with its host [
5,
16,
17,
18]. Furthermore, it has been employed as a model to understand the phenomenon of lysogeny and the factors that underpin the lytic-lysogenic lifestyle decision process [
19,
20,
21]. TP901-1 structural module encompasses 22 genes which are responsible for the biosynthesis of the phage head, tail, baseplate and DNA packaging machinery (
Figure 1).
TP901-1 possesses an isometric capsid and a long non-contractile tail of approximately 135 nm [
22]. At the distal end of its tail, TP901-1 presents a large, multi-protein, adhesion device termed a baseplate, which is comprised of the tail tape measure protein (TMP), distal tail protein (Dit), tail-associated lysin (Tal), and upper (BppU) and lower (BppL) baseplate proteins [
5,
23]. The BppL component is the
bona fide receptor binding protein (RBP) and is present as 18 trimers in the whole baseplate structure, which likely recognize and bind to cell wall polysaccharides presented on the surface of the host lactococcal cell. Its exquisite host specificity is based on heterologous expression of
Lactococcus lactis genetic region encoding host-specific glycosyltransferases associated with cell wall polysaccharide biosynthesis [
24].
The adhesion device of TP901-1 is in an infection-ready conformation and does not require activation by divalent cations such as calcium as is the case for certain tailed phages including the lactococcal phage p2 [
5,
25]. Mutational analyses of TP901-1 genes coding for the capsid and tail proteins has provided insights into the assembly of the mature virion and the function of several previously uncharacterized gene products [
17]. Furthermore, detailed genetic and structural analysis of the tail and baseplate proteins have rendered TP901-1 one of the best characterized phages capable of infecting Gram-positive bacteria [
5,
22,
26].
The recent development of AlphaFold has significantly enhanced and advanced our capabilities to predict individual and multi-protein structures [
27,
28,
29] and has transformed the field of structural biology. The ever-increasing number of phage genome and protein sequences and protein structures in public databases reinforces the need of reliable and rapid methods for the determination of (multi-component) protein structures and associated protein functionality. In the present study, we present predicted structures of TP901-1 capsid and neck, its extended tail, and its complete baseplate structure, which was previously partially determined by X-ray crystallography. AlphaFold predicted reliable structures of large parts of the phage in most cases, yet was unsuccessful in a small number of cases in which chaperone-aided folding is probably the reason for such structure prediction failure.
4. Discussion
The virion structure of the tailed phage that we present here is one of the most complete to date. For such prediction-based structural work, the validity of the predicted structures is an important issue. AlphaFold2 has been recognized to provide crystal-grade structures provided that the pLDDT values are sufficiently high, i.e. above 80% [
27,
28,
29,
56]. Here, we checked the structure validity at three levels. First, internally to AF2, we accepted structures that meet the AF2 internal validation score given by the pLDDT values, i.e. when the pLDDT values were above 70% in the folded regions. Although loops are often below this threshold, they do not significantly affect the overall fold. Moreover, we also examined the PEA plots that give confidence scores of protein-protein interactions as well as verified that the pLDDT values in the protein complex were at least equal or higher than those of the non-complexed structure. Second, we used an external validity criterion by assessing the agreement between the predicted assemblies and the experimental TP901-1 nsEM 3D reconstructions in the 15-25 Å-resolution range. We also use the available crystal structures to evaluate the capabilities and limitations of AF2 predictions. As a third validation approach, we checked with the Dali server [
30] whether the predicted folds were already reported in the PDB, without being an obligate criterion as structures with new fold may be generated (see below). Additionally, we used the PISA server [
57] to analyse the quality of the interactions within our assemblies.
In our predicted structure of the TP901-1virion, the hexon and penton of the capsid were well predicted, except for the N-termini as they may be involved in interactions with other hexons or pentons. Predictions of hexon/hexon or hexon/penton interactions were unsuccessful: the hexon and penton were predicted as separate units without any interactions. The procapsid assembly requires a so-called scaffolding protein (SP), which acts as a chaperone to ensure correct positioning of the subunits [
58,
59]. Structures of full or partial SP have been reported (PDB ID 1tx9, 2gp8, 1no4, 8dt0), and they are helical proteins as seen in phage the phi29, where it forms a dimer of two long α-helices [
58]. In the procapsid of staphylococcal phage 80α, the SP C-terminal α-helix was observed in contact with each of the capsid component [
60]. In phage TP901-1, the predicted structure of he expression product of
orf35, which is located immediately upstream of the MCP-encoding gene, displays a pattern of two long helices and some shorter ones, making it a good candidate for being an SP.
The predicted dodecameric portal exhibits all the characteristics of genuine phage portals previously reported [
43,
61,
62]. However, the TP901-1 portal has a unique feature: its clip domain contains three β-strands originating from two different portals and a fourth β-strand coming from the adaptor. The TP901-1 adaptor also presents a unique feature with its 24-stranded β-barrel forming its internal channel. Interestingly, the not compact and rather loose interface between the portal and adaptor is compensated by the presence of a ring of six NPS trimers whose N-terminal domains cover the adaptor ring. The hexameric stopper of TP901-1 is similar to that of other phages and exhibits long β-hairpins that embrace the tail tube upon complexation.
The tail and baseplate components, including the TT, MTP, and Dit, share structural features that have previously been reported [
15,
63]. The superimposition of the monomers reveals that the three components possess a β-hairpin that interacts with the other subunits of an hexameric ring and also provides a stacking platform between hexameric rings. The MTP and Dit also possess an extended N-terminus that inserts into a crevice of an above hexamer in the MTP/MTP and Dit/MTP interactions. While platforms of β-hairpins were also observed in the MTP stacking of the tailed phages 80α [
47] and T5 [
55,
64], the N-terminal lock was only observed in 80α for which a β-helix helps the interaction.
The structure of the TP901-1 baseplate has previously been reported, though without its Tal component [
5]. Here, the full-length Tal completes the X-ray structure. The central channel of the Tal N-terminal structural domain (1-380) is filled by three α-helices that link this domain to the functional C-terminal domain. Surprisingly, this central channel is filled in phage 80 by the C-terminus of the three TMPs contained in the tail tube [
47]. However, the position of the three helices in TP901-1 Tal is logical considering the Tal topology. Indeed, these helices and the C-terminal domain should dramatically rearrange to allow the TMP exit and DNA ejection during the initial stages of infection. Such a rearrangement has recently been reported for phage T5. Upon contact of the tail tip with T5’s receptor, the membrane protein FhuA, the Tal-like protein pb3 that obstructs the tail exit channel opens and rotates on the tail side, thus allowing the TMP (pb2) to insert into the membrane [
10,
55]. We postulate that a similar mechanism is operating in the case of phage TP901-1 upon baseplate/cell wall polysaccharide binding.
The prediction of the RBP structures and of their interaction with BppU was excellent, as were the prediction and localization of 12 out of the 18 BppU N-terminal domains that form a ring similar to the NPS ring or to the tail fiber rings observed in phage 80α [
47]. However, the six remaining BppU N-terminal domains and the BppU topology were out of reach due to the fact that the complete structure is well over the residue limit. Lastly, all attempts to model the TMP in complex with the MTP, Dit, and tail chaperone were unsuccessful.
To conclude, we have found that AF2 has impressive prediction capabilities, provided that the ensemble used for prediction involves less than a residue limit of around 4000. Most of our prediction failures were primarily linked to this limitation. However, as evidenced by the capsid and parts of the baseplate, the limitations may not be due to this factor only. Despite this, we illustrate here that AF2's predictions provide an avenue for further investigation into phage structure. This is particularly important for phages that remain poorly characterized at the structural level, such as those infecting human pathogens and Mycobacteria [
65].
Figure 1.
Schematic representation of the structural module of TP901-1. The functions are indicated above the arrows and the scale bar is presented at the base of the schematic in base pairs (bp). This figure and associated functions are based on those described in reference [
17].
Figure 1.
Schematic representation of the structural module of TP901-1. The functions are indicated above the arrows and the scale bar is presented at the base of the schematic in base pairs (bp). This figure and associated functions are based on those described in reference [
17].
Figure 2.
Capsid’s hexon and penton structures. (A) Ribbon view of the predicted hexon structure. One monomer is rainbow colored and its domains are labeled. The monomer at the figure’s bottom is colored according to pLDDT (quality of the prediction from blue to red). Note the low-confidence structure of the N-terminus. (B) Surface representation of the hexon (same orientation and scale as in (A)). (C) Ribbon view of the predicted penton structure. (D) Surface representation of the penton (same orientation and scale as in (C)). (E) Ribbon view of the structures of two hexons (H1 and H2) and one penton (P) fitted in the nsEM 3D reconstruction (EMD-2133).
Figure 2.
Capsid’s hexon and penton structures. (A) Ribbon view of the predicted hexon structure. One monomer is rainbow colored and its domains are labeled. The monomer at the figure’s bottom is colored according to pLDDT (quality of the prediction from blue to red). Note the low-confidence structure of the N-terminus. (B) Surface representation of the hexon (same orientation and scale as in (A)). (C) Ribbon view of the predicted penton structure. (D) Surface representation of the penton (same orientation and scale as in (C)). (E) Ribbon view of the structures of two hexons (H1 and H2) and one penton (P) fitted in the nsEM 3D reconstruction (EMD-2133).
Figure 3.
The dodecameric portal and adaptor structures. (A) The portal dodecamer has a size of 160 x 110 Å. (B) structure of the portal monomer with the different domains conserved in all portals. The C-terminal extension is poorly predicted and is disordered. (C) Lateral view of the stopper with the extended C-terminal α-helices and the central β-sheet. (D) same view rotated 90° relative to (C).
Figure 3.
The dodecameric portal and adaptor structures. (A) The portal dodecamer has a size of 160 x 110 Å. (B) structure of the portal monomer with the different domains conserved in all portals. The C-terminal extension is poorly predicted and is disordered. (C) Lateral view of the stopper with the extended C-terminal α-helices and the central β-sheet. (D) same view rotated 90° relative to (C).
Figure 4.
Portal to major tail protein structure. (A) Dodecameric portal (yellow) and adaptor (violet), hexameric stopper (light blue), tail terminator (TT, green) and major tail protein (MTP, grey). (B) The portal/adaptor interface (close-up view in inset1, i1) involves four β-strands: portal i s13 and s11 (anti-parallel), portal i+1 s12 and adaptor C-terminal β-strand stacks against α-helix 6 and portal i+1 α-strands s11 and s13. (C) Adaptor/stopper interface involves two adaptors (beige and orange) and one stopper (violet). Contacts originate from stopper’s N-terminus and loops joining the α-strands of the three monomers. The insertion of the adaptor displaces α-helix 6 (inset2, i2). (D) Stopper /TT interface: The long stopper’s hairpin s2-s3 (light green) is inserted between two TT monomers (red and beige). Other contacts involve the long TT’s loops h1-s1and s3-s5. (E) The TT/MTP interface. The MTP i-1 N-terminus (beige) is inserted between MTP i (orange) and TT (light blue), and comprises the majority of TT/MTP contacts. The MTP C-terminus is inserted vertically between two TT monomers.
Figure 4.
Portal to major tail protein structure. (A) Dodecameric portal (yellow) and adaptor (violet), hexameric stopper (light blue), tail terminator (TT, green) and major tail protein (MTP, grey). (B) The portal/adaptor interface (close-up view in inset1, i1) involves four β-strands: portal i s13 and s11 (anti-parallel), portal i+1 s12 and adaptor C-terminal β-strand stacks against α-helix 6 and portal i+1 α-strands s11 and s13. (C) Adaptor/stopper interface involves two adaptors (beige and orange) and one stopper (violet). Contacts originate from stopper’s N-terminus and loops joining the α-strands of the three monomers. The insertion of the adaptor displaces α-helix 6 (inset2, i2). (D) Stopper /TT interface: The long stopper’s hairpin s2-s3 (light green) is inserted between two TT monomers (red and beige). Other contacts involve the long TT’s loops h1-s1and s3-s5. (E) The TT/MTP interface. The MTP i-1 N-terminus (beige) is inserted between MTP i (orange) and TT (light blue), and comprises the majority of TT/MTP contacts. The MTP C-terminus is inserted vertically between two TT monomers.
Figure 5.
Figure 5. The fit of the neck in the nsEM density map. (A) The fit of the neck (portal 12-mer, yellow; adaptor 12-mer, pink; stopper 6-mer light blue; NPS N-termini 12-mer, white) in the nsEM map at 20.0 Å resolution. (B) Same view as in (A), but slabbed at half diameter. The NPS N-termini are located just below the portal/adaptor junction. (C) View at 90° from (A). The NPS N-termini cover the cavities observed at the portal/ adaptor interface. (D) The phage 80α fiber structure was superimposed onto the NPS N-termini to illustrate a possible structure of the NPS. (E) Surface view of the structural prediction of the NPS trimer with a total length of 48 nm.
Figure 5.
Figure 5. The fit of the neck in the nsEM density map. (A) The fit of the neck (portal 12-mer, yellow; adaptor 12-mer, pink; stopper 6-mer light blue; NPS N-termini 12-mer, white) in the nsEM map at 20.0 Å resolution. (B) Same view as in (A), but slabbed at half diameter. The NPS N-termini are located just below the portal/adaptor junction. (C) View at 90° from (A). The NPS N-termini cover the cavities observed at the portal/ adaptor interface. (D) The phage 80α fiber structure was superimposed onto the NPS N-termini to illustrate a possible structure of the NPS. (E) Surface view of the structural prediction of the NPS trimer with a total length of 48 nm.
Figure 6.
The tail tube. (A) View of four MTP rings forming a section of the tail tube (Top of the view is the direction of capsid). Note the insertion of each monomer C-terminus inside a monomer cavity in the above ring. (B) Same orientation as in (A) but slabbed at half-diameter and colored according to electrostatics. (C) Fit of the MTPs in the nsEM map at 20Å resolution (EMD-2228). (D) MTP/MTP interactions between stacked rings. Interactions involve the N- and C-termini and the β-hairpin with strands s3 and s4.
Figure 6.
The tail tube. (A) View of four MTP rings forming a section of the tail tube (Top of the view is the direction of capsid). Note the insertion of each monomer C-terminus inside a monomer cavity in the above ring. (B) Same orientation as in (A) but slabbed at half-diameter and colored according to electrostatics. (C) Fit of the MTPs in the nsEM map at 20Å resolution (EMD-2228). (D) MTP/MTP interactions between stacked rings. Interactions involve the N- and C-termini and the β-hairpin with strands s3 and s4.
Figure 7.
The distal MTP and the central baseplate. (A) Surface view of the distal hexameric MTP (grey), the hexameric Dit (blue) and the trimeric Tal (orange). Ga: Dit’s galectin domain; St: Tal’s conserved structural domain; Ld1 and Ld2: linker domains 1 and 2; pgd and endopeptidase (ep) domains. (B) Ribbon view of the MTP/Dit interface. (C) Ribbon view of the Dit/Tal interface. (D) Ribbon side-view of the Tal trimer N-terminus (residues 1-484). (E) Same view rotated by 90° within the phage main axis.
Figure 7.
The distal MTP and the central baseplate. (A) Surface view of the distal hexameric MTP (grey), the hexameric Dit (blue) and the trimeric Tal (orange). Ga: Dit’s galectin domain; St: Tal’s conserved structural domain; Ld1 and Ld2: linker domains 1 and 2; pgd and endopeptidase (ep) domains. (B) Ribbon view of the MTP/Dit interface. (C) Ribbon view of the Dit/Tal interface. (D) Ribbon side-view of the Tal trimer N-terminus (residues 1-484). (E) Same view rotated by 90° within the phage main axis.
Figure 8.
The peripheral baseplate. (A) Surface view of the dodecameric BppU N-terminal domain (yellow) complexed to the hexameric Dit (blue); (MTP: grey; Tal: orange). (B) Slabbed view of the nsEM map (EMD-1793; 25 Å resolution) of the complete baseplate with the distal hexameric MTP (grey), the hexameric Dit (blue), the trimeric Tal (orange) and the dodecameric BppU N-terminal domain (yellow) fitted inside. The grey ribbon inside the map represents the X-ray structure (PDB id 4v96). (C) The AF2 predicted structure of the “tripod” formed of the trimeric BppU C-terminal domain holding three RBP trimers. (D) Close-up of the BppU/RBP interface with the hydrophobic residues Ile 219, Phe 226 and Phe 232 also identified in the X-ray structure. (E) Surface view of the “tripod” fit in the nsEM map.
Figure 8.
The peripheral baseplate. (A) Surface view of the dodecameric BppU N-terminal domain (yellow) complexed to the hexameric Dit (blue); (MTP: grey; Tal: orange). (B) Slabbed view of the nsEM map (EMD-1793; 25 Å resolution) of the complete baseplate with the distal hexameric MTP (grey), the hexameric Dit (blue), the trimeric Tal (orange) and the dodecameric BppU N-terminal domain (yellow) fitted inside. The grey ribbon inside the map represents the X-ray structure (PDB id 4v96). (C) The AF2 predicted structure of the “tripod” formed of the trimeric BppU C-terminal domain holding three RBP trimers. (D) Close-up of the BppU/RBP interface with the hydrophobic residues Ile 219, Phe 226 and Phe 232 also identified in the X-ray structure. (E) Surface view of the “tripod” fit in the nsEM map.