1. Introduction
Cycads are one of the oldest families of living seed plants, dating back to the Late Permian period [
1]. More than 360 species of cycads exist, divided into 10 genera and two families, Cycadaceae and Zemitaceae [
2]. These plants are found in patches throughout tropical and subtropical regions of Asia, Africa, Oceania, and America [
3]. From a morphological perspective, the reproductive systems of cycads are most similar to those of spore plants, making cycads very important in studying the genesis and early evolution of seed plants. From the standpoint of their phylogenetic origin and evolution, cycads have existed and reproduced for at least 280 million years [
4], suffered significant environmental changes, and contain a wealth of genetic data. In terms of species protection, cycads have persisted and flourished until the present, although many plant species have perished due to changes in the earth's environment. Therefore, studying the mechanisms underlying the environmental adaptability of cycads is critical in conservation biology.
There is only one genus of Cycadaceae,
Cycas, which contains roughly 120 species [
4]. This genus contains a large number of species, involves a complicated taxonomy, and covers a wide geographical range.
Cycas are thought to have originated in Tertiary East Asia according to recent biogeographic research [
5]. In China, there are roughly 20 different species of
Cycas, mostly found in the southwest and southeast coasts [
6].
There have been some challenges and disagreements in the categorization of
Cycas species for the following reasons. First, it is challenging to properly analyze the physical traits of
Cycas. Since these species take a long time to reach sexual maturity, there are few flowering plants to be discovered. When combined with variable environments, populations of the same
Cycas species can also differ significantly. Additionally, natural hybridization may exist between different species [
7]. Hill proposed dividing
Cycas into six groups based on long-term observations of cycad plants in the field and integrating reproductive characteristics such as the ovule coat, megaspore leaf shape, and anatomical structure of the seeds: Section Asiorientales, Section Strangeriodes, Section Indosinenses, Section Cycas, Section Panzhihuaenses, and Section Wadeae [
8,
9]. Recently, Zheng et al. proposed 23 species from four sections in China based on distribution and morphological characteristics [
6]. Traditional taxonomy, however, has limited ability to establish species boundaries due to small morphological variations between species induced by either environmental or genetic causes.
The strategy of integrating DNA identification and morphological features was also employed to define
Cycas species, and more accurate findings were achieved. Using molecular sequencing techniques, Xiao and Möller performed a phylogenetic study of 31
Cycas species using the nrDNA ITS gene [
10]. The chloroplast (cp) genome presents substantial advantages for investigations in the field of plant evolutionary biology due to its genetic stability, well-preserved genome structure, and rate of evolutionary change that outpaces that of mitochondria [
11]. Liu et al. used four chloroplast genes and three nuclear genes to conduct a phylogenetic analysis of 104 species and 5 subspecies in the genus
Cycas [
12]. Nonetheless, the exploration of genomic resources within this genus has remained relatively limited, as evidenced by the small number of studies conducted [13-15]. In GenBank database, the collection of complete cp genome sequences for
Cycas species is currently limited to approximately 10 entries.
In this study, the cp genomes of six Cycas species, C. longlingensis, C. longisporophylla, C. guizhouensis, C. ferruginea, C. crassipes, and C. bifiba, were sequenced. Our study encompassed a thorough exploration of the cp genome, encompassing detailed descriptions of its assembly and annotations and the identification of simple sequence repeats (SSRs). Furthermore, we performed phylogenetic analyses of Cycas species based on the coding genes from cp genome sequences, incorporating both our newly sequenced species and those previously published. These findings complement current genetic information on Cycas species and serve as a good reference for Cycas DNA molecular research.
2. Materials and Methods
2.1. Plant Materials and DNA Extraction.
Fresh leaves of six different Cycas species, C. longlingensis, C. longisporophylla, C. guizhouensis, C. ferruginea, C. crassipes, and C. bifiba, were taken from the Guilin Botanical Garden (Guangxi, China; coordinates: N 25°4’14.88’’, E 110°17’57’’) and immediately placed in liquid nitrogen. The total genomic DNA was extracted from fresh leaves (> 1.0 g) using a Magnetic Plant Genomic DNA Kit (TIANGEN Biotech, Beijing, China) following the manufacturer’s instructions. The quality of DNA was evaluated utilizing a TBS-380 Mini-Fluorometer (Invitrogen) and electrophoresis on a 1% agarose gel.
2.2. Chloroplast Genome Sequencing and Assembling.
A total of 1 µg DNA was utilized as an input material for library construction. Using the VAHTS Universal Plus DNA Library Prep Kit for Illumina (Vazyme, Jiangsu, China), we crafted sequencing libraries in accordance with the manufacturer's guidelines while applying index codes to individual sample sequences. Briefly, the DNA sample underwent initial fragmentation into 300-500 bp segments through sonication. Subsequently, the preexisting DNA fragments underwent end-polishing and A-tailing, followed by ligation with full-length adaptors for sequencing. Polymerase chain reaction (PCR) amplification was then performed utilizing a cBot Truseq PE Cluster Kit v3-cBot-HS (Illumina). Lastly, PCR products underwent purification using an AMPure XP system (Beckman Coulter Inc., Brea, CA, USA), their library size distribution was determined using an Agilent 2100 Bioanalyzer, and quantification was carried out via real-time PCR. Using a cBot Cluster Generation System (Illumina Inc.), the indexed samples underwent clustering according to the manufacturer's protocols. Following clustering, the resulting libraries underwent sequencing on an Illumina Novaseq 6000 platform, generating reads with a length of 150 bp in the paired-end configuration.
The raw paired-end reads were subjected to quality assessment using the FastQC v0.11.7 software. Subsequent to quality evaluation, the obtained data were processed through de novo assemblers (Fast-plast,
https://github.com/mrmckain/Fast-Plast or GetOrganelle,
https://github.com/Kinggerm/GetOrganelle) to generate optimal contigs. The cp sequence of
C. szechuanensis (MH341576) was retrieved from GenBank and employed as the reference seed sequence for
C. longlingensis,
C. longisporophylla,
C. ferruginea,
C. crassipes, and
C. bifiba. Additionally, the cp sequence of C. bifida (MW900434) was utilized as the seed sequence for
C. guizhouensis.
Subsequently, the chloroplast (cp) genomes underwent annotation using the PGA software (available at
https://github.com/quxiaojian/PGA) and the Geseq software (accessible via
https://chlorobox.mpimp-golm.mpg.de/geseq.html) with default settings, followed by manual corrections. The resulting gene maps were visualized using the online tool OGDraw v1.2 [
16]. Six newly sequenced complete cp genomes were deposited to GenBank with the following Accession Numbers: Accession Nos.
C. bifiba (OQ862764),
C. crassipes (OQ862765),
C. ferruginea (OQ862766),
C. guizhouensis (OQ862767),
C. longisporophylla (OQ862768), and
C. longlingensis (OQ862769).
2.3. Repeat Sequences and SSRs.
The cp genome sequences of C. szechuanensis (NC_064393.1), C. shiwandashanica (NC_064393.1), and C. segmentifida (NC_064393.1) downloaded from GenBank were coupled with six newly-sequenced cp genomes of Cycas to conduct an analysis of repeat sequences and simple sequence repeats (SSRs). A Perl script called MISA was employed to detect SSRs within the complete cp genome sequences of the nine Cycas species. The thresholds applied for varying lengths of SSRs—mononucleotides, dinucleotides, trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides—were set to 10, 6, 5, 5, 5, and 5, respectively. Furthermore, the REPuter program was used to identify four categories of repeat sequences: palindromic, forward, reverse, and complement repeats. The recognition of repeat sequences adhered to the following criteria: (1) A Hamming distance of 3; (2) a minimum size of 20 bp; and (3) a sequence identity equal to or greater than 90%.
2.4. Variations and Divergence Hotspot Regions of the cp Genomes.
The mVISTA comparative genomics server was utilized to generate a sequence variation map with annotation of the
C. bifiba cp genome as a reference. Evaluation of IR sequence variations, encompassing features such as expansion and contraction, was conducted through the IRscope online program (
https://irscope.shinyapps.io/irapp/). To identify intergeneric divergence hotspots, sliding window analysis was carried out using the DnaSP v5.10 software [
17]. This analysis involved a window length of 600 bp and a step size of 200 bp.
2.5. Phylogenomic Reconstruction Based on cp Genomes.
We conducted a phylogenomic analysis utilizing six newly sequenced cp genomes of
Cycas species. Additionally, we included eight
Cycas species sourced from GenBank in our analysis. These sequences were employed to construct a phylogenetic tree using the maximum-likelihood (ML) method, with
Encephalartos lehmannii and
Bowenia serrulata defined as outgroups. To reconstruct ML trees, we extracted 86 protein-coding genes from the 19 species. Multiple sequence alignment was accomplished using MAFFT [
18], and selection of the GTR-GAMMA (GTR + G) model [
19] was guided by a model test applying the Bayesian information criterion (BIC) [
20]. The MEGA-X software facilitated the execution of maximum likelihood (ML) trees, with 1,000 bootstrap replicates configured to assess the branch support values. Visualization of the resulting phylogenetic tree was carried out using FigTree v1.4.4.
4. Discussion
The chloroplast genome data provide comprehensive insights for examining plant phylogenetics and analyzing evolutionary relationships [
21,
22]. The abundant data encapsulated within the chloroplast genome render this genome highly suitable as a DNA barcoding tool for species identification [
23].
Cycas, which is critically threatened across the world [
3], has been rarely studied to record its cp genome information. The current work presented the whole cp genomes of six
Cycas species, four of which (
C. longlingensis,
C. longisporophylla,
C. guizhouensis, and
C. crassipes) were reported here for the first time. Then cp genome comparison was conducted with another three species,
C. szechuanensis,
C. shiwandashanica, and
C. segmentifida, to gain insight into the variations between the aforementioned cp genomes. Together with five other
Cycas species, a phylogenetic analysis was performed based on complete cp genomes. Studying the cp genome sequences of these species can increase our biological understanding of
Cycas species evolution.
In general, plastomes exhibit a high degree of conservation in terms of their genome structures, gene orders, and gene contents [
24]. The structural configurations of the entire cp genomes in the six studied
Cycas species closely resemble those found in the majority of higher plants [25-27]. The overall structure is characterized by four distinct regions, including an LSC region spanning from 84,839 to 85,598 bp, an SSC region ranging from 17,559 to 17,687 bp, and two IRs from 31,392 to 31,880 bp each. The comparative examination of six intact cp genomes revealed significant similarities in parameters such as genome length (165,607–167,013 bp), structure, IR/SC borders, and GC content (37.8–38.0%). In addition, the equal number of rRNA, tRNA, and coding genes indicated that the analyzed
Cycas species are highly conserved. Previous reports indicated that GC content exhibits variation across distinct regions of cp genomes, with the IR regions displaying higher GC content due to the inclusion of rRNAs [
25,
28], which was in line with our results.
By comparing the variations in cp genome sequences between distinct taxa, it becomes possible not only to efficiently identify DNA fragments rich in information but also to foster the advancement of techniques for species identification and the exploration of population diversity [
29]. mVISTA and DnaSP6 were applied to evaluate variations in the cp genomes of different
Cycas species, and both methods demonstrated that
Cycas cp genomes were highly conserved. The IR regions were less variable than the LSC and SSC regions, which was consistent with the findings of a prior investigation [
30]. In addition, a prior report indicated higher susceptibility to mutations in non-coding regions compared to coding regions [
31]. In the present study, we observed only high variable regions within IGSs, rather than coding regions, which aligns with this characteristic.
Inheritance of the cp genome is uniparental, and within a given species, there exists a notable degree of variation in SSRs [
32]. Consequently, these SSRs serve as valuable molecular indicators for developmental analyses and species identification purposes [
33]. Moreover, SSRs frequently find applications as genetic markers in investigations pertaining to community genetics and evolutionary research [
34]. Among the repeats, those that showed the greatest enrichment were mononucleotide repeats followed by dinucleotide repeats. Overall, trinucleotide repeats were infrequent across all nine cp genomes. When conducting a comparative assessment of repeat sequences within the cp genomes, the repeat length distribution was the same in
C. segmentifida,
C. longlingensis,
C. longisporophylla, and
C. ferruginea, with an average length of 24.146 bp. In contrast,
C. guizhouensis and
C. crassipes exhibited much longer repeats, with an average of 30.563 bp. Notably, species with similar repeat length distributions of their cp genomes were close in the phylogenetic tree, indicating that large repeats are reliable molecular indicators in evolutionary studies.
Currently, protein-coding genes are commonly used to build phylogenetic trees [
35]. The results of this study revealed the genetic relationships between
Cycas plants. According to Zheng et al.,
C. revoluta and
C. taitungensis belong to Asiorientales, while
C. panzhihuaensis belongs to Panzhihuaenses [
6]. Consistent with this classification, the first two species were also grouped into the same clade in our results and further into a clade with
C. panzhihuaensis. Additionally, Zheng et al. discovered 18 species of Stangerioides in China [
6], but the
C. bifiba,
C. longisporophylla, and
C. longlingensis species analyzed in this study were not included in their research. Here, the phylogenetic tree indicates that these three species should belong to Stangerioides since they are grouped together in the same branch as other species that are part of this section. This categorization is justified for two reasons. Morphologically, the testa coats of this species are yellow to brown, and their microsporangiate cones and microsporophylls are soft to the touch. Geographically, these species are all found in the Guangxi area of China [
36,
37]. In summary, our phylogenetic analysis of
Cycas species relied upon protein-coding genes, which currently constitute the most comprehensive dataset available. This endeavor not only lays theoretical groundwork in this field but also provides the requisite technical details for advancing and effectively utilizing resources derived from
Cycas plants.
Author Contributions
Conceptualization, J.T.; methodology, R.Z.; validation, T.C.; formal analysis, S.Z. and L.P.; investigation, T.D. (sampling), S.C. (sequencing), R.Z. (data analysis); writing—original draft preparation, J.T.; writing—review and editing, T.C.; visualization, L.P. and S.Z.; supervision, X.W.; funding acquisition, X.W. All co-authors have reviewed and approved the final version of the manuscript for publication. All authors have read and agreed to the published version of the manuscript.