3.3.1. Pre-propeptide domain
PrtP homologs were as most subtilisin-like proteases synthesized as pre-proenzymes and first activated by cleavage of the PP-domain region after translocation across the cell membrane.
The majority of PrtP homologs each had a pre-peptide that encoded a typical predicted secretory signal (SS) peptide for translocation over the cell envelope. SignalP did not identify a clear SS peptide for PrtP
NFICC96H (
Figure S1) though its pre-peptide region shared sequence characteristics with the other predicted tripartite SS peptides of the PrtP homologs, containing a hydrophobic core region and two polar flanking regions. The length of the pre-peptides varied from 24 to 64 residues, of which the N-terminal polar flanking region mostly contributed to the variation in the lengths similar to other SS peptides [
25]. A cleavage site for the signal peptidase I was predicted at the end of the pre-peptide regions. Signal peptidase I releases translocated pre-proteins from the cytoplasmic site to the cell envelope, whereas non-cleavable N-terminal SS peptides can serve as cytoplasmic membrane anchors [
26]. Cleavage after the pre-peptide of PrtP homologs has to our knowledge not been elucidated as the pre-peptide region of the PrtP homologs might be removed as part of the autonomously removal together with the propeptide in the PP-domain.
The propeptides could be divided into two structural groups as the propeptides were predicted to be either ID regions or structured regions. These structural differences of the propeptides were also reflected by different pLDDT of the AF models. Here, regions with low pLDDT were indicted to be ID regions given that long regions with pLDDT < 50 can be interpreted as a prediction of ID [
27]. The ID characterization of the propeptides were corroborated by NetSurfP-3.0 predictions. The ID propeptides included ScpA, PrtS, ScpC, PrtH2, PrtR, PrtP
NFICC80 and PrtP
NFICC200, which represented Clusters I-III, XI and XII (
Figure 3). These propeptides differed in length with 48–147 aa and with no apparent conserved sequence similarities. On average, the ID propeptides had 20 % charged residues (range 10–32 %), of which 15 % were acidic (Asp, Glu; 4–27 %) and 5 % were basic (Arg, Lys, His; 0.9–9 %). The content of Pro residues was relatively high with 7 % in the range 4–14 % while Cys was absent. The structured propeptides were represented by the other 14 PrtP homologs, including subtilisin. Compared to the ID propeptides, the structured propeptides had higher pLDDT and were generally longer, with 145–172 aa. The average content of charged residues was 25 % (17–39 %), of which the acidic and basic residue contents were on average 11 % (8–21 %) and 13 % (9–22 %), respectively. The content of Pro residues was 3 % on average (1–6 %), whereas Cys residues were absent. The structured propeptides had a conserved structural fold (RMSD < 2.1 Å) that was composed of β1-α1-β2-β3-α2-β4 (
Figure 3). The four β-strands formed an antiparallel β-sheet, while the two helices were antiparallel. α2 was the shortest helix, with two turns, whereas α1 had generally seven to eight turns except for α1 in the propeptide of subtilisin. The propeptide of subtilisin shared 14–25 % sequence identity with the other homologous propeptides that had 24–58 % sequence identity across Cluster IV, V, VI, IX and X. Except for the short propeptides in subtilisin, the conserved core of the structured propeptides had a N-terminal helix region with 50–65 aa. This region was less conserved with 6–38 % sequence identity between clusters, adopting different helical structures.
The structured propeptides formed complexes with the PR-domains in the AF models of PrtP
NFICC96H, PrtH, PrtB and PrtL. The conserved fold established an interaction surface to the PR-domain by the N-termini of α1 and by the beta-sheet. The α1 helix interacted with a three-turn α helix of PR. This α helix was absent in subtilisin, which also had a shorter α1 in its propeptide. The β-sheet packed against the N-termini of two parallel helices in the PR-domain. Hereby, β4 directed an unstructured protein region into the catalytic cleft of PR where the catalytic tirade was in spatial proximity to the cleavage site (
Figure 3). The cleavage sites of PrtP homologs had been proposed to follow the motif pattern KVY[YH][PA][TN]↓D [
5], but this motif was not conserved across clusters of the PrtP homologs. Other aa constraints might apply to this region, which appeared to have a distinct length of seven aa between β4 to the cleavage site (
Figure 3).
The conserved structure of the propeptides appeared to dictate the cleavage site for the autocatalytic processing step, removing the propeptide from the mature PrtP homologs. In contrast, the lack of a fixed tertiary fold of the ID propeptides and the lack of the conserved cleavage motif suggested a less distinct cleavage site for propeptides characterized by ID as observed for ScpA. The propeptide of ScpA contained several processing sites for hydrolysis by either autocatalytic intramolecular cleavage or by exogenous protease activity [
28]. The propeptide of ScpC had been experimentally verified to be characterized by ID, but this region was also displaying significant helical propensity, which could be important for the secretion and folding processes of ScpC [
20]. However, the structural differences of the propeptides among PrtP homologs indicated that PrtP homologs might followed different maturation processes guided by the structural characteristics of their propeptides.
3.3.2. Catalytic region
The catalytic region followed the N-terminal PP-domain and consisted of a PR-domain with or without an insert PA-domain (
Figure 2). The conserved PR-domain of the PrtP homologs had a well-defined alpha/beta fold containing a 7-stranded parallel β-sheet that characterized the subtilase family. PR was the most conserved domain, with 25–69 % sequence identity across Clusters I–XIII. The PA-domain divided the PR-domain into two regions, PR1 and PR2, which were equally conserved. PR1 had 231–301 aa and was larger than the PR2 region with 114–132 aa (
Figure 2). Together the PR1 and PR2 regions formed a domain with 347–416 aa, similar to the PR-domains with no insert domains of 362–370 aa. The model structures aligned well when superimposed on the PR-domain of PrtP
MS22337 (RMSD < 1.2 Å). The PR-domain of subtilisin differed from the other PR-domain by its smaller size of 274 aa and the absence of a tree-turned helix structure PR-domain (
Figure 4). This short helix structure was surface exposed and might interact with the propeptide during the folding process as discussed above (
Figure 3). The catalytic triad with Asp and His in the PR1 region and Ser in the PR2 region were spatially close to each other, with distances of approximately 7.5, 8.4 and 10 Å between the C
α-atoms in Asp/His, His/Ser and Asp/Ser, respectively.
The catalytic region of the PrtP homologs from Clusters XI-XIII, including subtilisin, lacked a PA-domain. The remaining ten clusters contained PrtP homologs with PA-domain ranging from 137 to 189 aa. The PA-domains had 9–55 % sequence identity across clusters by which PA had up to 45 % lower sequence identity than the rest of the catalytic region. The PA-domain showed structure homology when superimposed on the PA-domain of PrtP
MS22337. Most RMSD-values were below 2.5 Å, except for the PA-domains of ScpC and PrtP
NFICC96H, which had RMSD-values of 3.0 Å and 3.3 Å, respectively. Secondary structures were not well defined in the PA-domain of ScpC. The other PA-domains shared a common core fold β1-β2-β3-α1-β4-β5-α2-β6 in which β1 and β6 formed an antiparallel β-sheet, marking the N- and C-terminal boundaries of PA-domain. This antiparallel β-sheet was distinctive for all PA-domains though lengths of the β-strands varied. A parallel β-sheet made the core including β2, β3, β4 and β5, which was surrounded by two peripheral helices α1 and α2 (
Figure 4). Other adjacent but less defined secondary structures were present in some PA-domains, as for PrtP
NFICC96H. The PA-domain of PrtP
NFICC96H had 41–52 aa more residues compared to its homologs, extending the loop between α2 and β6 with some helical structure.
3.3.3. Tail of fibronectin-like domains
Following the PR-domain, the A- and B-domain regions formed a C-terminal tail of domains with predominating β-strand structures (
Figure 2). Subtilisin and ScpA were different from the other PrtP homologs. Subtilisin lacked both the A- and B-domain regions, and ScpA lacked the B-domain region. The A-domain regions had always three fibronectin type-III-like domains (Fn1–Fn3), whereas the B-domain regions had two to seven (Fn4–Fn10) (
Figure 2). For all PrtP homologs, Fn1 was the only Fn-domain, which was identified as the Pfam domain Fn3_5. This fibronectin type-III-like domain lacked disulfide bonds and had a structural core with a sandwich fold. This fold was made up of two antiparallel β-sheets, one with three strands and one with four, and with the N- and C-termini oriented at opposite ends [
16]. Fn2 was generally the largest of the Fn-domains, containing 153–217 aa. The other Fn-domains had 73–164 aa.
The Fn-domains were compared intra- and intermolecular by generating a phylogenetic three based on a MSA of all Fn protein domain sequences (
Figure 5). All Fn1 and Fn2 domains were in two distinct phylogenetic clusters, indicating that these domains had conserved functionality among PrtP homologs. The other Fn-domains had a tendency to cluster according to their numbers, but some Fn-domains were outliers and did not follow this pattern in their clustering. It should be noted that not all branches had a bootstrap support above 70 % (
Figure 5). However, the general clustering pattern suggested that the Fn-domains were conserved across and not within the PrtP homologs and that Fn-domains along the fibronectin-like tails of the PrtP homologs were different from each other. Intramolecular comparison of the Fn model structures in PrtP
MS22337 (RMSD 1.9–18 Å) and ScpC (RMSD 1.6–18 Å) also indicated structural differences between the Fn-domains, but some Fn-domains were also suggested to share structure homologies (
Figure S6). In order to clarify the structural characteristics of the A- and B-domain regions, the structures of the Fn-domains were superimposed according to their designated numbers (
Figure 2) and to their clustering patterns in the phylogenetic tree (
Figure 5).
The Fn1 domains had 124–168 aa and showed structure homology when superimposed on PrtP
MS22337 (RMSD < 3.0 Å) (
Figure 6A). However, the sequence identity was down to 15 %. The sandwich fold of the Fn1 domain corresponded to the fibronectin type-III-like domain. The homologous Fn1 domains had structural variations in the regions Fn1
β3-β4 and Fn1
β5-β6. Fn1
β3-β4 had a β-hairpin structure in the PrtP homologs, except for ScpA, PrtS, ScpC and PrtP
NFICC96H. PrtP
NFICC96H had an extended loop whereas ScpA, PrtS and ScpC possessed shorter linker regions. Fn1
β3-β4 was oriented towards Fn2 to which Fn1 formed an interface. InScpA, PrtS and ScpC, Fn1
β5-β6 had a two-turned helix structure that was on the surface area, pointing away from the PR-domain. The other structure homologs had shorter Fn1
β5-β6 linker regions. ScpA contained the cell adhesion motif RGD at the solvent exposed C-terminus of its Fn1 domain, whereas PrtS contained the potential inactive version of the cell adhesion motif RGE [
16,
29]. The Fn1 domain of ScpC ended with the sequence pattern KGQ while the other Fn1 homologs lacked positive charges and instead had the C-terminal sequence pattern [YF]G[DQS].
The large Fn2 domains with 157–217 aa also showed structure homology when superimposed to PrtP
MS22337 (RMSD < 3.2 Å) (
Figure 6B). The sequence identity for Fn2 domains of the different clusters was in the range 9–48 %. The sandwich fold was surrounded by several loops and secondary structures that differed among the PrtP homologs. Fn2
β1-β2 was a large and variable region in terms of both sequence length, loop formation and secondary structure. This region of ScpA had previously been divided into two loop regions where the first loop region 733–757was suggested to contribute to substrate binding [
16]. Fn2
β4-β5 orientated towards the active site with approximately 15 Å from the catalytic triad when Fn2
β4-β5 formed an extended β-turn structure or a β-hairpin. This region had also been proposed to be involved in substrate specificity [
21]. However, Fn2
β4-β5 was much shorter in PrtP
NFICC96H than in the other PrtP homologs, locating Fn2
β4-β5 further away from the catalytic cleft. Fn2
β5-β6 pointed away from the active site and was more close to Fn3. This region formed a β-hairpin in most of the PrtP homologs, whereas Fn2
β5-β6 of ScpA, PrtS, ScpC, PrtB and PrtL had a shorter turn structure. In the phylogenetic tree, all Fn2 domains clustered together, but this cluster also included Fn4 of ScpC (
Figure 5). When this Fn4 domain was superimposed on the Fn2 domains of PrtP
MS22337, ScpC and PrtP
NFICC96H, the RMSD values were 3.2, 2.9 and 2.1 Å, respectively (
Figure 6B). This suggested that the Fn4 domain of ScpC had a Fn2-like structural fold, but the loop regions Fn2
β1-β2 and Fn2
β4-β5 were shorter, indicating that the Fn4 domain of ScpC have another functionality.
The Fn3 domain was the smallest in the A-domain region, with 102–121 aa. This domain had two structurally different folds, both with fibronectin type-III-like domain cores (
Figure 6C,D). The Fn3 domains of ScpA, PrtS and ScpC made up a phylogenetic cluster together with Fn5 of ScpC (
Figure 5). The domains of this cluster differed from the other Fn3 domains and had RMSD-values below 1.8 Å when superimposed on ScpC’s Fn3 (
Figure 6D). The Fn3
β3-β4 and Fn3
β5-β6 regions had different structures in the homologs. ScpA had no β-hairpin in Fn3
β3-β4, but was the only homolog with a β-hairpin in Fn3
β5-β6. Despite the distribution of the Fn3 domains in the phylogenetic tree, the other Fn3 domains showed structure homology when superimposed on the Fn3 domain of PrtP
MS22337 (RMSD < 1.9 Å) (
Figure 6C). Fn3
β5-β6 had a helical structure that pointed away from the PR-domain and towards the first Fn-domain in the B-domain region.
The phylogenetic tree of the Fn-domains suggested that the Fn-domains entailed more sequence variations in the B-domain region than in the A-domain region (
Figure 5). The largest phylogenetic clusters of Fn4, Fn5, Fn6 and Fn7 each had a characteristic sandwich fold with similarity to a fibronectin type-III-like domain (
Figure 6E–H). The two Fn10 domains did not adopt a well-defined structure in the AF models but were predicted to be prone for β-strand structures by NetSurfP-3.0. The Fn8, Fn9 and Fn10 domains were in the same phylogenetic cluster, in which the Fn8 and Fn9 domains had the same structural fold with RMSD-values below 2.2 Å when superimposed on Fn8 in PrtP
MS22337 (
Figure 6I). This suggested that the Fn8, Fn9 and Fn10 domains were repeating domains, probably the result of sequence duplications. A duplication event of the Fn2 and Fn3 domains of ScpC might also have generated the Fn4 and Fn5 domains, indicated by their sequence and structural similarities outlined above (
Figure 6C,D). The Fn6 and Fn7 domains of ScpC shared sequence and structural similarities with the Fn4 and Fn5 domains in PrtS, respectively (
Figure 5 and
Figure 6J,K). These domains appeared to differ from the other Fn-domains. The Fn5 domain of PrtH2 and Fn-domainthe Fn4 domains of PrtP homologs of Cluster XII had sequence and structural similarities, but differed from the other Fn-domains (
Figure 6L). PrtP homologs from Cluster XII had the same B-domain region as PrtH2 but lacked the Fn4 domain (
Figure 2). As a result, the Fn5 and Fn6 domains of PrtR, PrtP
NFICC80 and PrtP
NFICC200 corresponded to the general Fn6 and Fn7 domains (
Figure 6G,H). The B-domain region of PrtP
NFICC96H showed structural deviations, including its Fn8 domain that shared similarities with the general Fn5 domain. The other Fn-domains of PrtP
NFICC96H could not be properly classified as these structures were marked with too high uncertainties as reflected with the low pLDDT (
Figure S5).
In total, the Fn-domains of the PrtP homologs were categorized into 12 structurally different domains (
Figure 6). The conserved Fn1 and Fn2 domains were characterized by their close spatial proximity to the PR-domains of the PrtP homologs. The Fn3 domains provided the transition to the B-domain region with different Fn-domain composition, creating a backbone structure of repeating sandwich folds with five to seven β-strands. Hereby, some Fn-domains deviated from the strict definition of a fibronectin type-III-like domain, but the typical Fn-domains in the B-domain regions (
Figure 6E–L) were around 4 nm measured parallel to their sandwich folds, just like the Fn3 domain. The sandwich fold of Fn1 was around 6 nm while it was around 5 nm for Fn2. The backbone of the A-domain region would in an extended rod-like structure be around 15 nm, whereas the different B-domain regions could potentially contribute to further 8–28 nm.
3.3.5. Cell wall spacing domain
The cell wall spacing domain, also known as the W-domain, preceded the cell wall attachment domain and ranged from 36 to 234 aa in length (
Figure 2). All PrtP homologs with cell wall attachment domains contained a W-domain, whereas the W-domain was absent in PrtP
NFICC6H, as was the cell wall attachment domain.
The W-domains appeared as protein regions with no well-defined globular structures in the AF models. Instead, the W-domains of the AF models had very low pLDDT, indicating that these regions were characterized by ID (
Figure S5). NetSurfP-3.0 corroborated that the W-domains were characterized by ID, which distinguished the W-domain from the neighboring domains. The W-domains were quite hydrophilic, with a general low content of hydrophobic residues and an almost complete lack of Cys, Met and aromatic residues (Phe, Tyr and Trp). Low complexity characterized the sequences as these were rich in certain aa residues including Ala, structure-breaking residues (Pro and Gly), charged residues (Asp and Lys), and polar residues (Ser, Thr, Asn, and Gln). However, the dominating aa residues of the W-domains differed, resulting in acidic (theoretical pI < 5.5) and basic (theoretical pI > 9.0) W-domains. The sequence identity was below 19 % for the PrtP homologs across the different clusters. Sequence diversity of the W-domains also appeared within clusters, particularly Cluster X and XII that had 33–93 % and 6–51 % sequence identity, respectively. The W-domains had 87 % and 65 % sequence identity in cluster V and VI, respectively. Hereby, the W-domain appeared as the most variable domain in the PrtP homologs.
In Cluster X, the W-domains of the PrtPs ranged from 69 to 189 aa, with an imperfect modular repeat unit of 60 aa. As previously reported (Hansen and Marcatili, 2020), this unit was repeated up to three times and appeared to be unique to this group of PrtPs, as the same tandem repeat was not observed in other W-domains. Two other long tandem repeats were located in the W-domains of PrtH2 and ScpA using T-REKS and XSTREAM. PrtH2 had the longest W-domain, with a tandem repeat of 39 aa that was repeated 2.3 times. The sequence of ScpA repeated 17 aa 3.7 times as previously reported by Siezen, 1999. The repeats were located N-terminal in the W-domains of PrtH2 and ScpA, whereas the unit of 60 aa made up the entire W-domains of the PrtP homologs in Cluster X (
Figure S7). The W-domain with this sequence unit of 60 aa facilitated cell adhesion between
Lactococcus cells as well as between
Lactococcus and epithelial cells [
30]. Interestingly, the number of tandem repeats increased the degree of cell interaction, which was driven by protein-protein interactions. The W-domain with a single sequence unit was less accessible for interactions that hindered or disrupted cell adhesion when the PrtP was cell anchored [
30]. In this case, almost 60 residues appeared to be necessary for displaying a functional protein domain on the bacterial surface though more than 90 aa was needed in other cases [
31].
Except for the W-domains of PrtP
SK11, PrtP
MS22337 and PrtP
MS22333, the W-domains preceding an LPXTG-like motif ranged from 69 to 94 aa, which in a fully expanded form could span a cell wall of 19–26 nm. The length of these W-domains might affect the efficiency of protein anchoring and protein stability as described for other Gly/Pro and Ser/Thr rich low complexity linker regions preceding LPXTG-like motives [
32]. The W-domains of PrtH, PrtB, PrtL and PrtH2 preceded a SlpA-domain, suggesting another functionality of these W-domains than to be cell wall spacers. These W-domains had large sequence variability ranging from 45 to 234 aa and shared no obvious sequence characteristics such as charge or sequence patterns. Hereby, the ID characterized W-domains appeared to have at least a dual role in PrtP homologs facilitating surface exposure and/or mediating interactions for cell wall attached PrtP homologs.
3.3.6. Cell wall attachment domains
The cell wall attachment domains of the PrtP homologs followed the W-domain towards the C-termini (
Figure 2). The presence of either an AN-domain or a SlpA-domain reflected two distinct cell envelope attachment mechanisms. Interestingly,
Leuconostoc PrtP
NFICC96H lacked both of these attachment domains and instead terminated after a tail of eight fibronectin-like domains. The lack of a cell wall attachment domain had also been observed for
Lactobacillus PrtPs [
6], implying that termination without an attachment domain might be widespread domain architecture of PrtP homologs.
Each of the AN-domains started with an LPXTG-like motif, then a region with hydrophobic aa and a short tail of charged residues (
Figure 8). This domain structure corresponded to other LPXTG anchoring domains recognized by the transpeptidase enzyme sortase [
33,
34]. Sortase mediats covalent attachment of secreted proteins to cell walls by cleaving the canonical LPXTG motif between Thr and Gly. This canonical motif was found in six of the PrtP homologs, while the putative motives for PrtP
NFICC96Q, PrtP
NFICC96W, Prt
NFICC120, PrtR and ScpA were LAKTA, LPDTA, FPTTN, MPQAG and LPTTN, respectively. The motives of PrtR and ScpA had previously been reported, but without experimental verification [
14,
16]. However, functional variants of the canonical motif existed among gram-positive bacteria, which entailed a tendency for species-specific motif patterns [
6,
34]. Sortase variants had also displayed altered substrate specificities, allowing substitution within the motif, such as L→M, P→A and T→[ALSV] [
33,
35,
36]. These motif variations were consistent with the observed motif patterns in the PrtP homologs, indicating that the AN-domains, along with the conserved C-terminal region, were most likely functional attachment domains.
The SlpA-domains occurred as tandem pairs of the Pfam domain PF03217, forming a bipartite three-dimensional structure (
Figure 9). Surface layer proteins such as SlpA form functional crystalline monolayers of some bacteria, including several but not all Lactobacilli species [
37]. The SlpA exists in different proteins, in which they can mediate non-covalent attachment to the bacterial cell wall. Among the analyzed PrtP homologs, the SlpA-domains were only represented in the Lactobacilli PrtP homologs PrtH, PrtH2, PrtB and PrtL (
Figure 2). These SlpA-domains contained 114–136 aa, where the repeating regions consisted of approximately 60 aa and shared sequence identities ranging from 11 to 95 %. All four SlpA-domains were rich in Lys residues (17-32%) and depleted of acidic aa residues, resulting in domains with high theoretical pI values (pI > 10). As a result, the SlpA-domains would be protonated within the pH range for optimal activity of PrtP homologs, complementing the negative charges of the bacterial cell wall. Electrostatic forces were the main contributor for PrtL attachment [
12], supporting the cell wall anchoring function of the SlpA-domain within PrtP homologs. The AF structure models of the SlpA-domains were predicted with diverse qualities, which did not support a homology assessment of their predicted three-dimensional structures. The structures of the SlpA-domains of PrtL and PrtH2 were uncertain, reflecting unlikely structures that could not be divided into two separated structural domains. On the other hand, the structural models of the SlpA-domains of PrtB and PrtH had pLDDT generally above 70, suggesting generally good backbone prediction (
Figure S5). The overall structures of these SlpA-domains had a similar three-dimensional dumbbell shape, organizing the β-sheets in two regions (
Figure 9). This structural fold corresponded to the structure of other SlpA-domains, where the two β-sheet regions were divided by a spacer region with reduced sequence identity compared to the two tandem regions [
37,
38].