1. Introduction
The family Noctuidae is highly diverse in species with almost 12,000 species, forming the third largest family within the order Lepidoptera [
1]. Many members in this family are phytophagous insects and highly harmful to crops or forests. Despite of the large number of previous studies focusing on the morphology, physiology, and biological control of Noctuidae species [
2,
3], our understanding on the genomic diversity of the Noctuidae species, especially the transposable elements (TEs), is still in its infancy.
TEs are a class of repetitive sequences dispersed throughout the genome. They are essential component of eukaryotic genomes that can jump around the genome within the same chromosome or between different chromosomes, even transfer horizontally between species [
4,
5].
TEs move through the genomes using either a “cut-and-paste” or a ‘‘copy-and-paste’’ mechanism, and TEs have important impacts on architecture, function, and evolution of the host genome[
6,
7,
8]
.
In recent decades, many insect genomes have been assembled. Insect genomes vary significantly in size, ranging from as large as 6.5 Gb in
Locusta migratoria to as small as less than 0.1 Gb in
Tetranychus urticae. However, the number of genes in their genomes are similar, and the difference in genome size is mainly due to variations in TE contents [
9]. Previous studies on Arthropoda, Lepidoptera, and the genus
Drosophila suggested a positive correlation of genome size to the TE content [
7,
9,
10]. Nevertheless, it is especially challenging to study insect TEs for several reasons. 1) Insect genomes vary greatly in size and in the proportion of TEs. For example, two species in the order Diptera,
Aedes aegypti and
Belgica Antarctica, have a TE genome content of 55% and 1%, respectively. Even in the same genus,
D. simulans and
D. ananassae have significant difference in TE content, at 10% and 40% of the genome, respectively [
11]. 2) Many lineage-specific TEs exist in different insect genomes, such as the
Zisupton subfamily that is specific to coleopterans genomes [
11]. 3) The TE composition is also highly variable in insect genomes. For example, DNA transposons the predominant TE class in
Heliconius Melpomene, a species in order Lepidoptera, whereas DNA transposons have very low genome proportion in
Papilio polytes, another species in order Lepidoptera [
7]. 4) TE propagation is significantly different in insect genomes. Wu’s study found that out of 14 arthropod species, only silkworm had a large number of recent expansion TEs, which probably was responsible for the adaptation to domestication in silkworms [
9].
In addition to the large TE variation, another challenge in studying insect TEs is the existence of horizontal transfer TE (HTT) events in insect genomes. TEs can transfer from one host to another in two ways. The first is vertical inheritance, where they are passed from parents to offspring. The second way is horizontal transfer, which occurs between organisms that do not mate [
12]. The horizontal transfer allows TEs to jump from an old host to a new one. In the old host genome, natural selection and silencing mechanisms can suppress the propagation or delete TEs from the genome. However, when it inserted into a new host genome, it can escape the suppression and extinction[
13]. Therefore, HTT plays an important role in the long-term survival of TEs. Since the first HTT event was reported in
Drosophila melanogaster[
14], a total of 2,836 HTT events have been recorded in HTT-db by 2017 [
15]. One of the most recent HTT events occurred after 2010 in
D. simulans. A
P element horizontal transferred from the
D. melanogaster into
D. simulans genome and the
P element could be found in the populations of
D. simulans only after 2010 [
16]. Previous study suggested that order Lepidoptera is a hotspot for HTT events [
17]. As one of the largest family of Noctuidae in the order Lepidoptera, however, there is no comprehensive study of HTT events among Noctuidae species, as well as between Noctuidae species and non-noctuid arthropods to date.
By 2020, genome assemblies are available for ten species of the Noctuidae. The ten species belong to seven genera. With multiple prediction methods, in this study we annotated and characterized TEs in the genomes of the ten species to reveal the genomic diversity of TEs in Noctuidae and the correlation of TE content to the genome size of noctuids. We also investigated how different TE classes/subfamilies expanded/contracted in the genome of Noctuidae insects. We also estimated HTT events among the genome of Noctuidae species, as well as between Noctuidae species and other arthropods, and elucidated how HTT events affect the evolution of Noctuidae genome.
4. Discussion
4.1. TEs shape the genome diversity of Noctuidae species
Unlike protein-coding genes under the selective pressure, TE sequences are usually not subject to selective pressure and thus change rapidly [
46]. In addition, TE expansion/contraction occurs in a high frequency in the genome of arthropods[
11],leading to enormous variations of TEs in arthropod genomes. To date little has been understood on the TE characteristics and genome-wide diversity in Noctuidae species. This study constructed a consensus sequence library for ten Noctuidae species containing 1,038 – 2,826 TE consensus in each genome, finding TEs showed high variations among Noctuidae species, even among species of the same genus. Genome content of TEs also varied greatly (from 11.33% to 45.1%) among the ten species. The high variation of TE content among close related species was consistent with previous studies on Lepidoptera species, where TEs account for 4.7% - 38.3%of the genomes[
7], and on Insecta species, where TEs account for 1% - 55% of the genomes[
11]. It was suggested that the increase/decrease of TE content was the most important reason affecting the genome size of arthropods[
10].Similarly, in the Noctuidae, we found a strong positive correlation between TE content and genome size (r>0.8, p<0.01). In particular, we revealed that LINE and DNA transposons contributed most to the genome sizes, in contrast to SINEs which had no significant correlation. However, a study based on more than ten arthropods found that LINE, SINE, LTR and DNA transposons were all positively correlated with genome sizes (r>0.6)[
9]. The discrepancy might due to the relatively smaller content of LTR and SINE in the noctuid genomes compared to other arthropods.
Noctuid species also exhibited significant differences on the copy numbers and lineage-specific expansion of TE subfamilies. The SINE/
5S subfamily was one of examples whose copy number highly varied among closely related species. Copy number of SINE/
5Ssubfamily was only 130-204 copies in the three species of genus
Spodoptera, but more than 100,000 copies in
B. fusca. While among the ten species,
B. fusca had the closest phylogenetic relationship to genus
Spodoptera. It suggested that an expansion of SINE/
5Sin the genome of
B. fusca. Activity estimation found propagation peak of SINE/
5S about 6 Mya in
B. fusca(
Figure 4E), suggesting elements in the SINE/
5S probably still active recently. Three species in the genus of
Spodoptera allow the investigation of lineage-specific TE propagation among closely related species. We found an obvious expansion of LTR/
Gypsy subfamily specific in
S. exigua but not in
S. frugiperda and
S. litura. The expansion event occurred very recently with a propagation peak about 2 Mya, long after the divergence of
S. exigua from other species (
Figure 3F) and represented a species-specific expansion event.
We found the SINE/
B2 was a lineage-specific subfamily presented only in
T. ni. Since
T. ni diverged first from other Noctuidae species in phylogeny, the subfamily might either result from a loss of
B2 in other noctuid species, or from an insertion of
B2specific into
T. ni through horizontal transfer. We further investigated SINE/
B2 subfamily in other genomes by comparison of
B2 consensus sequences with the reference genomes of 27 Lepidoptera species (
Table S7) using the blastn tool, but did not find the sequence in any genome other than
T. ni. The
B2 consensus sequence was from RepeatModeler based on the published TE library and was not identified by machine learning, thus the classification was reliable. Therefore, it is highly likely that SINE/
B2 inserted into
T. ni genome rather than lost in other lineages. The insertion was estimated from 60Mya with a peak propagation around 20-30 Mya. However, where the
B2 subfamily came from and how it integrated into
T. ni genome requires further study.
4.2. TE expansion activity correlated with phylogeny of Noctuidae species
In addition to be the main contribution factor to genome size of Noctuidae species, we also investigated which class/subfamily of TE was correlated with phylogeny of Noctuidae. Among the four classes of TE, only LINE showed phylogenetic signal with high confidence, indicating the essentially vertical inheritance characteristics of LINE elements in Noctuidae. In particular, four LINE subfamilies, CR1,L2,RTE and R1showed high correlation with Noctuidae phylogeny, all of which were abundant in copy number. In contrast, despite of the high copy number of DNA/Helitron subfamily in noctuid genomes, its correlation with phylogeny was not significant. This was probably because the different integration mechanism of the Helitron. Another potential reason is elements in the Helitron subfamily had involved in horizontal transfer events in the Noctuidae species, which we will discuss below.
We further elucidated whether the expansion of TE class/subfamily contributed to the evolution of noctuid genomes. We noted that the LINE, LTR and DNA transposons all had relatively low activity in the genus
Helicoverpa, that was probably the reason that both species in the genus
Helicoverpa had the least TE content and the smallest genome sizes. In contrast, the LINE, LTR and DNA transposons all expanded in the genome of
B. fusca(
Figure 6)accumulating the highest TE content of
B. fusca (45.1%) in the ten species. By calculating the change rate of TE, interestingly, we found only LINE and DNA transposons had their expansion occurred very recently (
Figure 4) indicating these TEs were highly likely active in
B. fusca, especially the LINE/
CR1, LINE/
R1 and LINE/
RTE subfamily who showed recent expansion in
B. fusca.
Although LINE and DNA transposons largely impacted the genomes of Noctuidae species, this was not the case in recent evolution of a specific species. For example, the activity of TE class/subfamily in
S. exigua was substantially different from other two
Spodoptera species. LTR elements expanded in
S. exigua genome and the expansion occurred very recently (
Figure 3F) accumulating the highest genome content of LTR (4.22%) among the ten species, in contrast to a reduction of LTR in
S.frugiperda and
S. litura genome. We identified several HTT events related to LTR elements in
S. exigua genome (discussed below), which may contribute to its recent expansion. Thus, LTR is the TE class that had the most important impact on the recent evolution of
S. exigua.
4.3. HTT events on the genomes of the noctuid moths
Despite of the essential homoplasy free characteristics of TE, horizontally transferred TE (HTT) events have been widely reported in the insect genomes[
15]. Peccoud’s study found that as high as 2,248 HTT events occurred among 195 insect species in the last 10 million years, which probably was only a tiny fraction of the actual HTT events between insects [
43]. Another study analyzed 460 species of arthropods for horizontal transfer and found significantly more HTT events in Lepidoptera than in other arthropods [
17]. Our study identified a total of 56 possible HTT events among the ten noctuid genomes. Previous study indicated that the higher the copy number of a subfamily, the higher the probability its member was misclassified as an HTT event [
47]. By calculating frequency of HTT events per thousand TE copies, we did not observe the trend in our results, suggesting our results do not suffer from the copy number bias. A large scale study on the HTT in insect genomes identified a total of 2,248 HTT events, 1,087 of them were associated with DNA/
Tc1-Mariner subfamily[
43]. In our study, 17 of the 56 HTTs belonged to elements in the
Tc1-Mariner subfamily which was the highest among all subfamilies. The
Tc1-Mariner is short in length (1 Kb ~ 2 Kb), which may facilitate horizontal transfer through the vector resulting in high frequency of HTT event.
One HTT event involved an LTR/
Gypsy element transfer between
S.exigua and
M. configurata. By calculating divergence of the
Gypsy consensus of
S.exigua, we estimated the divergence rate of the
Gypsy consensus was 1.38%in the
S. exigua, converting to an insertion time of 2.37 Mya. It was consistent with the activity estimation of
Gypsy with a burst propagation around2 Mya in the
S.exigua genome. When TEs inserted into a new genome by horizontal transfer, the new host genome was generally unable to immediately inhibit the replication and translocation of the transposons. If the TE maintained replication capacity in the new genome, they may lead to massive TE replication[
13]. Therefore, we inferred that the horizontal transferred
Gypsy element could have led to the recently mass replication of
Gypsy in the
S.exigua genome, accounting for 2.25% of the entire genome, about4.5-6.4 times of that in the closely related species of the same genus.
It is noteworthy that for the HTT events identified between closely related species, no matter how strict the threshold, there is still the possibility that the similarity of vertically inherited TEs exceeds the selection threshold. Previous study suggested that the closer the distribution and the closer the affinity of the species, the higher the frequency of horizontal transfer occurred between species [
43]. The study further suggested the method of minimal number of HTT events to identify HTT events between species that diverged more than 40 Mya to avoid exaggerating the number of HTT events. Following the method, we identified three minimal numbers of HTT events between the noctuid moths and eleven non-noctuid arthropods. These HTT events occurred in the genome of five noctuid species including
B. fusca,
M. configurata, and three
Spodoptera species. Previous study suggested that genomes with more HTT events probably had higher TE contents and larger genome sizes [
9]. All the five noctuid showed relatively larger genome sizes and higher TE contents compared to other five species, indicating HTT events might have shaped the evolution of Noctuidae genomes by leading to TE expansion.
Among the three minimum number HTT events, one Mariner–related HTT occurred specific in B.fusca. We further searched the Mariner sequence of B.fusca in other arthropod genomes, and found highly similar sequence from Cyphomyrmex costatus in Hymenoptera, supporting the horizontal transfer of this element between different insect genomes. However, whether the Mariner element transferred from M.martensii to the B.fusca genome needs further study with more arthropod genomes.
In conclusion, this study constructed a consensus sequence library for ten Noctuidae species based on multiple methods, significantly improving TE annotation in the Noctuidae genomes. By comparison of the TE genome content, TE composition and propagation activity of TE class/subfamilies among the ten Noctuidae species, this study provided new insights into the essential contributions of TEs to the genome size variation, genomic diversity, and phylogeny of Noctuidae species. We identified lineage-specific TE subfamilies and recent expansion of TE subfamilies in some Noctuidae species, suggesting they were probably still active in the Noctuidae genomes. Moreover, a total of 56 potential HTT events were identified among the noctuid species, and 3 minimum numbers of HTT events between the Noctuidae species and 11 non-noctuid arthropod species. The HTT events could account for the recent expansion of Gypsy subfamily in the S.exigua genome and the species-specific expansion of Mariner subfamily in B. fusca.