2.3. Sequence Analysis and Characterization of Waglerin’s Precursor Gene
The transcriptome unveils the precursor gene of waglerin through the transcript TwBNP01, which contains a full-length sequence and has the highest expression level among all toxins expressed in the transcriptome (FPKM=2,47,433; 75.19% of all-toxin FPKM) (
Figure 1). BLASTp analysis matches TwBNP01 with 100% amino acid sequence coverage and 93.3% sequence identity (including additional or duplicated residues) to the gene UMK70519 (GenBank accession) found in the public database. The gene UMK70519 was derived from another species,
Tropidolaemus subannulatus (locality: North Philippines, sequenced by Sangar’s method) [
49], and is deposited in the databank as “waglerin peptide 2” notwithstanding sequence variation from the original sequence of waglerin isolated from
T. wagleri venom [
50]. Both TwBNP01 and UMK70519 sequences are homologous to snake venom ACEI/BPP-CNP genes while exhibiting multiple mutations (mainly substitutions and deletions) accounted for sequence variation observed between the two (
Figure 2A). The comparison suggests both TwBNP01 and UMK70519 are orthologous genes in
T. walgeri (current study) and
T. subannulatus [
49], respectively.
T. wagleri and
T. subannulatus are sister species that diverged from a common ancestor, and are both asympatric with non-overlapping biogeographic distribution [
13]. Typically,
T. wagleri distributes in the western part of the Indo-Malayan Archipelago whereas
T. subannulatus are found eastward in islands such as Sulawesi and the Philippines. The venoms of the two congeneric species show remarkable variation in their reverse-phase chromatographic profiles in spite of close phylogenetic relationships [
12]. While waglerin is highly expressed in the
T. wagleri venom proteome (close to 20% of total venom proteins), a comprehensive venom proteome or protein isolation study is yet available to identify and quantify the corresponding orthologous protein in the
T. subannulatus venom. Similarly, its expression level in a full transcriptomic study has yet to be reported for comparison with that of
T. wagleri.
Waglerin was first purified from
T. wagleri venom in the early 1990’s, and fully sequenced using the Edman degradation method (UniProt ID: P58930) [
50,
51]. In the present study, we identified the waglerin-coding region in the transcript TwBNP01 sequence based on a 100% identity that matches the amino acid sequence of P58930:
1SLGGKPDLRPCYPPCHYIPRPKPR
24. The 24 amino acid residues of waglerin were embedded in the propeptide regions corresponding to the position from 58
th to 82
nd amino acid residues in the sequence of TwBNP01 transcript (
Figure 2A). The waglerin sequence, however, shows variation from the corresponding region of UMK70519 in which two amino acid residues are non-synonymously substituted in
T. wagleri and
T. subannulatus:
17Y→
17H,
18I→
18R (
Figure 2A). In both mutations, the hydrophobic residues tyrosine and isoleucine (
T. wagleri) are substituted for the positively charged histidine and arginine (
T. subannulatus). Theoretically, the gene UMK70519 of
T. subannulatus would produce a “waglerin-like” peptide with a higher peptide basicity than waglerin (P58930) and that of TwBNP01 in view of the substitutions. Indeed, computational predictions suggest the waglerin peptide (
T. wagleri, current study) has a pI of 9.69 and a molecular mass of 2748.26 Da, while the waglerin-like peptide (
T. subaanulatus) has a pI of 10.31 and a molecular mass of 2765.26 Da. This has potential implications on the toxin activity, as a previous mutagenesis study showed the active site of waglerin is governed by basic residues that reside in the proximity of its disulfide linkage (shown in
Figure 2A) [
32,
52]. Furthermore, it was suggested that the closer the basic amino acid residues are to the disulfide bond, the higher the waglerin toxicity would be [
52]. Thus, the current finding implies waglerin from
T. wagleri is relatively more conserved while substitutions occurred in UMK70519 (
T. subannulatus) may result in enhanced toxicity of its waglerin-like peptide. This mutated trait could be a key innovation for speciation that facilitates
T. subannulatus in exploring and adapting to a new ecological niche as it diverged from
T. wagleri.
A BLAST search applying the sequences (24 amino acid residues) yielded a minimal return of homology matches in the database, confirming waglerin and waglerin-like peptides are exclusively found only in the
Tropidolaemus clade of Asiatic pit vipers. A previous study by Schmidt et al. [
51] suggestd similarity between waglerin and snake venom bradykinin-potentiating peptides (BPPs) by showing that waglerin and a BPP (from
Bothrops jararaca) [
53] are proline-rich with 50% homology. In comparison, the classical snake venom BPP has no disulfide bonds, contains fewer residues than waglerin, and has a blocked amino terminus (pyrrolidonecarboxylic acid as the amino-terminal residue). The current work took a closer look into the sequences of waglerin and various BPPs, and in addition, azemiopsin from the Fea’s Viper (
Azemiops feae, subfamily: Azemiopinae) (
Figure 2B). Alignment for homology inspection on these sequences shows the original waglerin (P58930) and the waglerin-coding sequence of TwBNP (current work) are indeed identical (100% identity) but varied from that of UMK70519 (
T. subannulatus) (91.67% identity), and there is a considerably low degree of similarity between the
Tropidolaemus peptides and those of BPPs of the other genera (~50% identity and below) including
Bothrops,
Crotalus,
Protobothrops,
Trimeresurus, and
Azemiops. Unlike the waglerin peptide, snake venom BPPs have no cysteine residues and thus the absence of disulfide bonds, contain fewer residues, and tend to form a blocked amino terminus where glutamine or glutamic acid forms pyrrolidonecarboxylic acid as N-terminal residue (
Figure 2B). These short peptides, nonetheless, are characterized by a high content of proline (P) residues which render them resistant to peptidase hydrolysis [
54,
55]. Notably, the “origins” of all these proline-rich peptides are embedded within bigger precursor genes of variable lengths recruited by different lineages in the Viperidae family.
The protein-coding regions of precursor genes which share a homology of BPP/ACEI-CNP gene are hyper-variable, and have yet to be fully characterized for a number of species including the
Tropidolaemus pit vipers. Thus, we further deduced the various putative protein-coding regions in TwBNP01 and UMK70519 sequences with reference to the well-characterized BPP/ACEI-CNP gene of
Lachesis muta muta (UniProt ID: Q27J49) for its sequence coverage of 100% matched to TwBNP01 (
Figure 3). The analysis confirms that both sequences of
Tropidolaemus species share a conserved structure resembling the BPP/ACEI-CNP gene of
L. m. muta, containing a signal peptide sequence, a long propeptide sequence spanned by multiple short peptide-coding domains, followed by a long spacer gene with an intriguingly high content of repeated glycine (G) residue within a well-conserved poly-His-poly-Gly region toward the C-terminus, where the CNP is encoded for. In the precursor gene Q27J49 of
L. m. muta, there are five domains coding for BPPs, one domain for bradykinin inhibitor peptide (BIP), and one CNP-coding domain [
56,
57]. Classically, BPPs are modular peptidic molecules with a C-terminal QIPP or HIPP, and post-translationally modified N-terminal pyroglutamic acid (tryptophan in some variants) [
55]. In comparison, these domains are not recognizable at all in both sequences of TwBNP01 (
T. wagleri) and UMK70519 (
T. subannulatus), suggesting
L. muta muta evolves the BPPs and BIP independently through mutations later at the genomic level. Its first BPP (
34WPPRPQIPP
42) and second BPP (
50QKPWPPGHHIPP
60) sequences show multiple substitutions from the corresponding segments in TwWAG01 and UMK70519, where the mutation events significantly increase the content of proline residues while casting the C-terminal QIPP or HIPP motifs, which are characteristic of BPPs (
Figure 3). Its third BPP domain (
65QEWPPGHHIPP
75) emerges more likely through duplication, while the fourth and fifth BPP are again the results of substitutions. The BPI domain (
137TPPAGPDVGPR
147) in Q27J49 is novel and not present neither in TwWAG01 nor UMK70519 sequences. The CNP gene sequences, nonetheless, are conserved in the three species, in particular between TwWAG01 and UMK70519, where there is only one substitution (A→T) occured. The
L. m. muta sequence is on the whole longer (239 amino acid residues) than the
Tropidolaemus sequences (203-209 amino acid residues) (
Figure 3), with suggestive features of duplication within its long propeptide and spacing gene regions that permits mutation for molecular novelty.
2.4. Divergence and Conservation of BPP/ACEI-CNP Genes among Diverse Genera
A wider comparison involving the transcript sequence of TwBNP01 and ACEi/BPP-CNP precursor genes from multiple genera of Crotalinae, as well as
Azemiops feae, was illustrated in
Figure 4. Three regions with considerable conservation were identified amidst the variable sequences: (1) the signal peptide region and part of the initial propeptide (see overall residue numbering 1-28), (2) a long spacer gene involving the propeptide starting from PHESPAGGT- (residue numbering 159-181) excluding the BIP region, (3) the CNP domain and its preceding propeptide (residue numbering 199-306) (numbering based on the sequence of TwBNP01). Across the sequences, the three conserved regions were bridged by highly variable amino acid residues of varying lengths. The variable segments form hypermutable regions within the propeptides, permitting the proline-rich peptides (waglerin, waglerin-like peptide, BPP, azemiopsin) to evolve
de novo from the large precursor genes [
49], with neo-functionalization in adaption to ecological niche and available prey for the different species.
Notwithstanding, the signal peptides are highly conserved among the genes compared. TwBNP01 and UMK70519 sequences have identical signal peptides which are, however, slightly variable from the rest of the species involving substitution of the 9
th and 28
th amino acid residues. Both
Tropidolaemus spp. have
9Gly which was substituted for
9Ser in other species and
9Cys in
A. feae. The
28Glu in
Tropidolaemus was more variably substituted for Gln, Gly and Tyr in the other species. The signal peptide for
Tropidolaemus can therefore be considered as genus-specific, and potentially a useful genetic marker. The second conserved region containing residues
159PHESPAGGXTALREELSPGPEAA
181 is of less predictive value for
Tropidolaemus genes. Between TwBNP01 (
T. wagleri) and UMK70519 (
T. subannulatus), a substitution was found (
168T→
168M), while this region in TwBNP01 is identical to that of
C. rhodostoma (BAN04688). Interestingly, the gene K4IT20 from
A. feae is much variable due to its ambiguous sequence which suggests deletion of certain amino acid residues that follow the coding sequence for azemiopsin, preceding the supposedly conserved region. Lastly, the third conserved region comprising propeptides before the CNP domain is recognizable by the characteristic repeating poly-Ala and poly-Gly motifs across all species. The findings support the orthologous origin of the large precursor genes among these basal and advanced viperid snakes, while highlighting regions of vast variability that give rise to the emergence of diversely small, proline-rich bioactive peptides, notably waglerin, azemiopsin, BPP and BIP in their venoms. This concurs with the hypothesis of hyper-mutatable propeptide regions being a source of molecular novelty in the course of the diversification of advanced snakes [
49].
Phylogenetic analysis of the BPP/ACEi-CNP precursor genes further revealed polyphyletic clades which conform to the biogeographical distribution of the species in comparison (
Figure 5). The azemiopsin-containing precursor gene (K4IT20) is an outgroup to the rest of crotalid (pit viper) BPP/ACEi-CNP genes, consistent with its taxon as a separate subfamily (Azemiopinae). In the pit vipers, two major paraphyletic clades are observed on the tree topology, namely the Old World group (represented by
Calloselasma,
Trimeresurus,
Protobothrops,
Tropidolaemus) and the New World group (
Bothrops,
Lachesis,
Crotalus,
Sistrusus), respectively, while the Asiatic
Gloydius spp. appear closer to the latter (
Figure 5). In the Old World clade, BAN04688 from
C. rhodostoma remains basal in the tree, and the ancestral protein further evolves along the diversification of
Trimeresurus,
Protobothrops and
Tropidolamus spp. (
Figure 5). Of note, TwBNP01 (in which lies the waglerin peptide) originated from the Malaysian
T. wagleri forms a monophyletic clade with the waglerin-like-conding gene (UMK70519) of
T. subannulatus from North Philippines, with the latter sequence appears to be more diversified. The tree topology also shows an earlier diversification of BPP/ACEi-CNP proteins in the
Gloydius spp. from members of their sister taxa in the Old World. The ancestral protein of
Gloydius spp. is recruited multiple times by the more derived New World pit vipers then (
Figure 5). From the perspective of phylogeography, the
Gloydius spp. are regarded as the most eastward dispersed Asiatic pit vipers, and are likely related to ancestral species which cross the Bering Bridge and radiate into the New World.
In
Figure 4, multiple sequence alignment shows that waglerin/waglerin-like peptides and azemiopsin emerge within the hypermutatable propeptide region, which otherwise gave rise to BPPs in the other species. The bradykinin-potentiating peptides (BPPs) and C-type natriuretic peptides (CNPs) of snake venoms exhibit vasoactive blood pressure-modulating activity, thus increasing the venom efficacy by subjecting the prey to a secondary subduing effect of the venom, e.g., a sudden reduction in blood pressure [
55]. Although the precursor genes of TwBNP01 and UMK70519 contain the CNP regions, no expressed proteins have been found in their venoms, while BPP-coding sequences simply never emerge
de novo in both. We infer this as a phylogenetic constraint in which
T. wagleri and
T. subannulatus have no mechanistic way of evolving a venom phenotype that induces hypotension in prey. Instead, these pit vipers exclusively produce distinct peptides which functionally converge with alpha-neurotoxins (neurotoxic three-finger toxins) of the elapid species. In fact, waglerins are highly expressed as the principal toxins of
T. wagleri, while in other snake venoms, BPPs and CNPs are expressed at very low or negligible amounts, indicating secondary or ancillary functions [
12,
34]. The twist in the course of venom evolution taken by the
Tropidolaemus species leads to the innovation of neurotoxic trait in this clade, where waglerin and waglerin-like peptides are neo-functionalized as the principal toxins to target and antagonize the neuromuscular nicotinic receptors, producing paralysis as a rapid way of subduing the prey, especially in the arboreal habitat [
33,
58,
59]. The use of small neurotoxic peptides is exemplified by another basal group of viperid snakes, i.e.,
Azemiops spp. (Fea’s vipers), from which the 21-amino acid residue peptide (also proline-rich) azemiopsin (UniProt ID: B3EWH2) is derived [
60]. Hence, within the Viperidae, the neurotoxic trait has been amplified on at least two separate occasions independently, once in the lineage of
Azemiops, and again in
Tropidolaemus (
Figure 5), notwithstanding their taxonomic divergence into two separate subfamilies, Crotalinae and Azemiopinae, and their distinct habitats (arboreal
vs. ground-dwelling). The convergently evolved neurotoxic activities of waglerin and azemiopsin are taxon-specific though, resulting in differential effects in various animals where amphibians, avians and rodents appear to be more susceptible than human [
59]. Accordingly, these taxon-specific neuroactive peptides rarely result in significant neurotoxic syndrome in human envenoming cases but can be used as lead compounds in drug discovery and receptor probes in biomedical research [
61,
62].
To note, the BPP/ACEI-CNP gene of
T. wagleri embraces the coding region of waglerin, which is the most extensively studied venom protein originated from this species. Comparing the transcriptomic profile (current study) and reported proteome, the BPP/ACEI-CNP transcripts and waglerin proteins are both the most abundantly expressed [
12,
34]. This distinct venom characteristic is virtually exclusive to
T. wagleri, representing a unique phenotype not typically evolved in other pit vipers. Other toxins expressed in the gland are derived from protein families shared across various lineages of hemotoxic pit vipers. These include enzymes such as proteases (SVMPs and SVSPs), phospholipases A
2, L-amino acid oxidase, and non-enzymatic C-type lectins (snaclecs) and vascular endothelial growth factors. The amounts of transcripts transcribed and proteins translated, however, do not correlate exactly considering factors such as the dynamic regulation of gene expression, differences in the synthesis rate and half-life of mRNA, and effect of post-translational modification [
44,
45]. The lack of correlation between venom gland transcriptomics and proteomics is not uncommon, as obsevered in various other species [
37,
44,
46,
47,
48]. In these studies, the transcriptomic approach is used to reveal the gene expression pattern of all toxins at a single time point resembling a snapshot, typically a few days after snake venoms were extracted as an attempt to stimulate venom replenishment in the glands. On the other hand, venom proteomes show proteins that have been produced, matured, post-translationally modified and accumulated in the glands over an unspecified duration. Nonetheless, the transcriptomic approach applying
de novo assembly offers unmatched information on the expression patterns and structures (amino acid sequences) of species-specific toxins, with traceability to distinct genomic loci. This permits a better understanding of the evolutionary origin of venom, and how mutations could have modulated a venom’s phenotype in terms of biological function and toxicity. The insight is important for the characterization of a unique gene, as exemplified by the full-length BPP/ACEI-CNP gene of
T. wagleri for coding waglerin as its the most abundantly expressed toxin.