1. Introduction
Venom is a biological toolkit comprised of cocktails of molecules that disrupt the physiology of an organism when it is injected (envenomated) by the organism that produced it. The delivery mechanisms, behaviors, and ecological interactions of envenomation can be studied across the animal tree of life because venom has convergently evolved across most major lineages [
1]. Among known venomous organisms, Cnidaria is the oldest lineage in which venom has evolved. Cnidarians are unique among venomous organisms because they have a microscopic and decentralized venom delivery apparatus, the nematocyst. Nematocysts are made by a kind of cell unique to cnidarians (nematocytes) and are highly diverse in shape, size, and in spatial distribution across the body [
2].
Among cnidarians, venom is best understood in the cnidarian subphylum Anthozoa (corals and sea anemones), and particularly within the anthozoan order Actiniaria (sea anemones). Venom is integral to the biology of sea anemones because they use venom for a variety of functions, including prey capture, defense, digestion, and intraspecific competition [
3,
4]. Due to this biological breadth, the sea anemone venom arsenal is highly diverse in its molecular constituents, with most species making an array of different compounds [
5,
6,
7,
8,
9]. The compounds found in sea anemone venom can be broadly categorized into larger functional categories, and include auxiliary toxins, allergens and innate immunity toxins, hemostatic and hemorrhagic activity toxins, mixed function enzymes, voltage-gated channel toxins, pore-forming toxins (cytolysins), protease inhibitors, and actiniarian toxins of unknown function [
4,
9,
10,
11]. At a genomic level, the genes encoding venom in any given species can vary in identity and in abundance, with abundance reflecting multiple copies of genes: [
12] or differential transcription (variation in normalized expression)[
13] or both.
Studies of genes encoding venom in sea anemones have shown that genes and gene families evolve under purifying selection [
14,
15]. This is distinct from genes and gene families for venom families in other lineages, where diversifying selection predominates [
16,
17,
18]. This initial assessment of evolutionary trends in the genes for venom in sea anemones was based on a limited number of species and only members of the superfamily Actinioidea. Actinioidea is estimated to have diverged from other lineages of sea anemone approximately 450 mya [
19], making it significantly older than venomous lineages like snakes and cone snails, which are estimated to be <100mya [
16,
17,
18]. More recently, the taxonomic scope of study of evolution in venom genes of sea anemones has expanded, but the functional breadth of the venom families considered remains limited, emphasizing neurotoxins [
12,
20,
21,
22].
For sea anemones, as for many groups, the biomedical potential of venom has driven their study. This likely has played a role in the emphasis on neurotoxins, whose analogs in snakes and cone snails have strong potential for pharmaceutical application [
23]. The peptide SHK1, originally isolated from the actinioidean
Stichodactyla helianthus, has been explored for its potential in autoimmune disorders (reviewed in [
22,
24]) and its therapeutic potential has led to studies of its structure [
25], function [
24], and diversity [
12,
22,
26]. Other sea anemone venom constituents, like the amylase inhibitors Helianthase [
27] and Magnificamide [
28] represent novel biological solutions to common metabolic problems like the breakdown of carbohydrates[
29]. Yet other molecules within sea anemone venoms, like NV1 [
22], have both genetic and structural similarities to human peptides and proteins and thus may offer biomedical insights.
Insulin is a widely-occurring and well-studied peptide in vertebrate systems[
30,
31]. It is also found in non-vertebrates, although only a handful of studies have examined the function of insulins across the tree of life. In amniotes, insulin is produced by the pancreas and functions to regulate blood sugar levels by promoting glucose uptake and the absorption of glucose from the blood into liver, fat, and skeletal muscle cells. In non-vertebrates, insulin-like molecules are referred to as Insulin-Like Peptides (ILPs) and typically hold distinct functions. In insects, ILPs regulate metabolism and growth [
32,
33] and are a key factor in reproduction and molt signaling [
34]. In nematodes, ILPs act as pathway antagonists, stopping L1 arrest during development[
35]. In mollusks, ILPs have been tied to neural regulation [
36]. In a few lineages, including bees [
37], monotreme mammals[
38,
39], snakes [
40,
41], lizards [
42], and cone snails [
43,
44], insulin has been recruited into the venom cocktail, where it may block site-specific channels and/or act indirectly on prey by disrupting insulin homeostasis. The best-studied ILPs in a venomous system is in marine cone snails [
17,
43,
44,
45], in which ILP acts antagonistically towards prey by causing hypoglycemic shock, enabling the snail to slowly eat their prey [
45].
Sea anemones produce ILPs that closely resemble those of cone snails [
46]. These toxins are referred to as Cono-Insulins [
46,
47], distinct from the neurotoxic conotoxins also found in cone snails. In their study of venom of the undescribed actinioidean sea anemone,
Oulactis sp., Mitchell and colleagues [
46] found that Cono-Insulins showed signs of binding to both Kv1-3 binding sites and insulin growth receptors. Sequences encoding Cono-Insulins also occur in the genome of
Nematostella vectensis [
48]
Cono-Insulin is transcribed as a peptide with three chains (A+B+C) that undergoes post transcriptional modifications to cleave off the C chain and form a mature pro-peptide (A+B) [
36]. The structure of the peptide offers an opportunity to examine selection across its functional chains. Work by Gibbs [
16], and Jouiaei[
15] and their colleagues point to the possibility of certain regions in venom encoding genes evolving rapidly under the influence of adaptive selection.
The only other known instance of an insulin-derived toxin is the protein VP302, originally discovered in the transcriptome of a swimming scorpion [
49]. VP302 has also been identified in other cnidarians, including in transcriptomes of four species of Cerianthiaria [
50], in the genome[
51] of
Hydra vulgaris, and in clownfish-hosting sea anemones [
9]. Studies which annotate venom coding genes functional families have found that VP302 match to the protein family (PFAM) insulin receptor growth factor [
9,
50].
The diversity and evolutionary history of ILPs in the venom of actiniarians is poorly characterized [
47]. The commonness of Cono-Insulin transcripts in species that span the sea anemone phylogeny suggests that ILPs may be found across all sea anemones, but their diversity is undocumented, both within taxa and across the breadth of Actiniaria. Fu and colleagues[
47] interpret sea anemone ILPs to have a single origin relative to other metazoans, but their study did not include Cono-Insulin sequences from non-anemone cnidarians or exemplars of VP302. The breadth of occurrence of VP302 and its evolutionary history within sea anemones or Cnidaria is unknown.
Previous work on venom genes in actiniarians predict that ILPs will evolve under negative selection [
14,
15], but whether generalization from neurotoxins to ILPs is appropriate remains untested. Furthermore, genes encoding venom may undergo episodic and site-specific selection [
14,
15], and punctuated equilibrium may represent a more appropriate model in at least some actiniarian neurotoxins [
12,
14].
In this study, we document the diversity of ILPs in the transcriptomes of 34 species of sea anemone. Additionally, we model the structure of exemplar ILPs based on sequences from each of the major clades of actiniarians from which we document ILPs. We analyzed the ILPs we recovered to infer 1) the evolutionary history and diversity of each type of ILP within Actiniaria and 2) the mode of selection for the genes encoding ILPs in sea anemones. We explicitly consider both kinds of ILPs previously identified in sea anemones, Cono-Insulins and VP302.
2. Results
2.1. Identification
We recovered ILPs in 28 of the 34 species we studied. We found no sequences that matched ILPs in the transcriptome of the sole representative of superfamily Actinostoloidea, Stomphia coccinea (family Actinostolididae). We also found no ILPs in transcriptomes from the actinioideans Epiactis prolifera and Macrodactyla doreensis (both family Actiniidae), nor in those of the metridioidean Lebrunia danae (family Aliciidae).
The ILPs we recovered from across Actiniaria matched two previously described venom types: 1) Cono-Insulin toxins found in cone snails [
52,
53]; and 2) Venom Protein 302 (VP302) originally described in
Lychas mucronatus (Chinese swimming scorpion). Transcripts with high match to Cono-Insulins were recovered in 20 species. Transcripts with high match to VP302 were recovered in 21 species. Nineteen species had transcripts of both Cono-Insulins and VP302, and five species had no transcripts associated with either Cono-Insulins or VP302 (
Table 1). We recover multiple isoforms for both Cono-Insulins and VP302, which our phylogenetic analyses designate as belonging to separate clades (see below). In seven of the 21 species having a VP302 transcript, we recover two isoforms; in ten of the 20 species having a Cono-Insulin transcript, we recover multiple isoforms (
Table 2).
Transcripts that closely match Cono-Insulins were transcribed as a singular domain transcript of 107 amino acids, with a signaling region at amino acid (AA) position at 1-26 and the mature peptide at AA position 27-107. For our phylogenetic and selection analysis, we only used the functional Cono-Insulin (Chains A+B), which contains six conserved cystines. From UniProt, we sourced transcripts of other metazoans known to produce Cono-Insulins and used them in our alignment (
Figure 1). All the reads showed high levels of similarity across all taxa. Among sea anemones, we found that chain A had more conserved AA than chain B (
Figure 1).
VP302 was found within a multi-domain transcript with 1-4 domains. The multi-domain transcript maps to a region of the genome of
Exaiptasia diaphana (GCA_001417965.1) that is un-scaffolded and is annotated as a four-domain Kazal protein (
Figure 2). Considering the model sea anemone species
Exaiptasia diaphana [venom302_TRINITY_DN2905_c0_g1_i25] as an example, VP302 has a signaling region at AA 1-14 and an insulin-like growth factor-binding proteins (IGFBP) domain from AA 15-96. A BLAST search of this region on UNiPROT returned matches with IGFBP proteins from a range of taxa, but its top hit (bit score 71.1, E value 5e-13) was to A0A8B7DI77_HYDVU, a sequence from the hydrozoan cnidarian
Hydra vulgaris annotated as “Venom protein 302-like.” When aligned across all sea anemones, VP302 was found to have a conserved region at positions 37-67 and a total of 11 conserved cystines (
Figure 3).
VP302 is followed by a Kazal domain protein, encoded at AA position 100-167 in the transcript from the sea anemone
E. diaphana. The top hit for this region (bit score 182, E value of 6e-52) is to A0A913YN61_EXADI Uncharacterized protein from
E. diaphana. Its top hit with an annotation is A0A8B6XI09_HYDVU BPTI/Kunitz domain-containing protein 4-like (bit score 179, E value 5e-10) from
Hydra vulgaris. An intron occurs at AA position 168-195. Following the Kazal protein, at AA 196- 256, is a region that matches a PFAM domain Antistasin; the BLAST search of UNiPROT had its top hit (bit score 49.1, E value 8e-05) A0A817K5N8_9BILA, an Antistasin-like protein from an unknown rotarian protist. Antistasin is better known as blood coagulation factor Xa/proclotting enzyme inhibitor and was previously noted as a possible venom constituent in clownfish hosting anemones [
9]. The region from AA 257-314 had a PFAM annotation of BPTI/Kunitz domain; its top hit (bit score 73.3, E value of 1e-14) via a BLAST search of UNiPROT was to A7SDB9 A7SDB9_NEMVE, a BPTI/Kunitz inhibitor from the sea anemone
Nematostella vectensis. BPTI/Kunitz domain proteins are also known as venom Kunitz-type proteins.
In all sequences examined here, the four exons occurred in the same order: IGFBP (VP302) > Kazal > Antistasin > Kunitz_BPTI. The studied species of sea anemone show variation in number of transcripts, the length and sequence of the elements within the multidomain transcript, and the number of venom genes within those transcripts (
Table 2). We found only one deviation from this canonical order: one transcript from
Actinia tenebrosa [A0A6P8HV62_ACTTE] lacked an Antistasin and had a tandem repeat of Kunitz_BPTI. All other transcripts from
Actinia tenebrosa followed canonical order, even when they varied in the number (occurrence) of genes within the multidomain transcript.
2.2. Phylogenomics
We find Cono-Insulin genes distributed into four clades (
Figure 4). Each clade includes genes from species belonging to multiple superfamilies, but in no clade does the gene tree match the expected species tree. The earliest branching clade, denoted Clade 3, is sister to the remaining three lineages and contains the fewest sequences, with genes from the hydrozoans
Hydra and
Clytia and four species of actiniarians: the metridioideans
Diadumene lineata,
Telmatactis australis, and
Triactis producta and the actinioidean
Condylactis gigantea (
Figure 4). Clade 2 is the most species-rich clade, with sequences from 16 species; Clade 1A has sequences from 13 species and Clade 1B has sequences from 11 species. Clades 1 and 2 include only sequences from Anthozoans, with transcripts from the scleractinians
Pocillopora damicornis and
Stylophora pistillata recovered in Clades 1 and 2 (Clade 1B:
S. pistillata; Clade 2:
S. pistillata,
P. damicornis). In general, there are more shared species between Clade 1 and 2 than between Clades 1A and 1B: Clades 1A and 1B share only two species; Clades 1A and 2 share nine species, and Clades 1B and 2 share six species. The metridioidean
Metridium senile and the actinioidean
Heterodactyla hemprichii are the only species with transcripts in Clades 1A, 1B, and 2. No species has sequences in all clades, or in Clade 1, 2, and 3.
The gene tree from our phylogenomic analysis includes three clades of sequences for VP302 (
Figure 5). Clades B and C are more closely related to each other than either is to Clade A, but precise relationships among the three clades are unclear. Clade A has the fewest number of sequences and the least diversity in terms of species, including only three species, all members of the metridioidean subclade Cuticulata:
Nemanthus annamensis, Calliactis polypus and
Telematics australis. Clades B and C both include representatives from Actinioidea and Metridioidea and in those clades, the gene trees largely reflected relationships expected based on the species phylogeny. Clade B has sequences from nine species and Clade C has sequences from 15 species. Overlap in terms of the species that contribute sequences to each clade is greatest between Clades B and C, which share six members; Clades A and B share one species, and Clades A and C share none. Reads which in Clade C only contained sequences that only coded for VP302 and not any multidomain proteins.
Figure 5.
Genes tree of VP302. Tree depicts each clade in a distinct color. Clade A and B are interpreted to have arisen via a duplication event, whereas Clade C implies and independent lineage of VP302.
Figure 5.
Genes tree of VP302. Tree depicts each clade in a distinct color. Clade A and B are interpreted to have arisen via a duplication event, whereas Clade C implies and independent lineage of VP302.
2.3. Predicted Protein Structure
Across the four clades of Cono-Insulins, the transcripts we used to predict structure did not converge on a single structural template (
Table 3). Furthermore, no one clade had any specificity to any one template. We found that templates based on AlphaFold-predictions were used for transcripts belonging to Clade 1B from
Nematostella vectensis and
Oulactis sp. and for transcripts belonging to Clade 2 from
Oulactis sp.
In general, predicted Cono-Inulin protein structures that did not use an AlphaFold template had low Global Model Quality Estimate (GMQE) scores (<.50). The three predicted Cono-Ins which used AlphaFold models have a GMQE score of (>.60) but (<.66). Sequence identity followed the same pattern, with non-AlphaFold sequences having a score of (<.40) and AlphaFold templates having (>.80). Coverages varied across all predicted proteins, but generally with scores (>60) see
Table 3.
We found that predicted proteins structure for Cono-Insulins fell into two categories, those that used Swiss-Prot templates and those that use AlphaFold templates. The predictions that used a Swiss-Prot template all are inferred to have two canonical loops that form a large singular loop that folds back to itself and a 3-4 loop barrel. The proteins that were modeled on an AlphaFold template differed by not folding onto themselves, instead pointing away and forming a four-loop barrel (REF FIG 6).
VP302 showed less variation in shape, with all but two predicted proteins matching the template [3tjq.1.A Serine protease HTRA1]. The predicted protein of C
ryptodendrum adhaesivum from Clade B used the template [A0A6P8IHV9.1.A Four-domain proteases inhibitor-like], whereas the predicted protein for
Telmactis australis. in Clade C matched to [1wqj.1.A Insulin-like growth factor binding protein 4]. In general, for VP302, protein structure did not vary in any significant way: it comprised two larger adjacent loops and a third incomplete loop that in some cases included a smaller tight loop (
Figure 6). All but the predicted protein for
Telmatactis australis in clade C had a GMQE score >50. Coverage for VP302 predicted proteins was generally high (0.87-0.98) except for the predicted protein for
Actinia tenebrosa (0.43) in Clade B (
Table 3).
2.4. Sites under Selection
Based on the results of our gene trees for both of our venoms, we evaluated omega (dN/ dS) for the alignments for each independent clade and the across all clades (global). We used MEME to find sites that may have undergone episodic positive selection and the program FUBAR to infer sites under positive and negative selection. Additionally, for Cono-Insulin, we tested selection in functional chains A and B independently to determine whether the chains have distinct evolutionary pressures (
Table 4). This allows us to determine whether there were clade-specific differences in the nature or strength of selection.
The global omega value (0.0211) for Cono-Insulins indicates strong negative selection as the prevailing mode of evolution. Omega ranged from 0.0075-0.156 in the constituent clades (
Table 4). Despite finding no sites that inferred to have undergone episodic positive selection when all sequences are considered together, when considering each clade independently, MEME detected positive selection at one site in Clade 1B, three sites in Clade 1 (Clade 1A+ 1B), three sites in Clade 2, and four sites in Clade 3. Evaluating the sequences not by their clade but by their functional chain comes to a similar result: the overall value for omega is <1 for both Chain A and Chain B, with Chain A having have two sites inferred to be under positive selection and Chain B having none.
FUBAR found no sites under positive selection and 42 sites under negative selection when looking at all sequences in all clades for Cono-Insulin. No sites in Clade 1 (Clade 1A + Clade 1B), Clade 2, and Clade 3 were inferred to be under positive section, although one site within Clade 1B was inferred to be under positive selection when only those sequences were considered. FUBAR detected signatures of negative selection at 42 sites across the genes in the gene tree. Different sites were interpreted to be under negative selection in each lineage: 42 sites in Clade 1A, 32 sites in Clade 1B, 26 sites in Clade 2, and 11 sites in Clade 3. Although many of the sites are shared, the 42 sites interpreted to be under negative selection in Clade 1A and in the gene tree overall are not identical. In Clade 2, FUBAR found no sites under positive section and 26 sites under negative selection. For Clade 3, sister to both Clade 1 and 2, FUBAR found no sites under positive section and 11 sites under negative selection.
The overall omega value for VP302 was 0.172, indicating negative selection. MEME detected a total of 10 sites under positive selection when all sequences were considered. Clade A had the highest omega value (0.256) and no sites under positive selection. Clade B had the lowest omega value (0.0979), and MEME found 5 sites under positive selection. Clade A + Clade B had an omega value of 0.119 and seven sites inferred to be under positive selection. Clade C, sister to Clades A + B, had an omega value of 0.178 and two sites inferred to be under positive selection.
FUBAR identified no sites under positive selection when looking at all clades of sequences or when considering the sequences of each clade independently. It detected 30 sites under negative selection in Clade A, 43 sites in Clade B, and 26 in Clade C. Considering Clade B+C identified 45 sites under negative selection, including several sites not interpreted as under negative selection when the sequences of each of the constituent subclades were considered separately.