Preprint
Article

Chromosome-Level Genome Assembly and Annotation of the Crested Gecko, Correlophus ciliatus, a Lizard Incapable of Tail Regeneration

Altmetrics

Downloads

76

Views

58

Comments

0

This version is not peer-reviewed

Submitted:

26 September 2024

Posted:

30 September 2024

You are already at the latest version

Alerts
Abstract
The vast majority of gecko species are capable of tail regeneration, but singular geckos of Correlophus, Uroplatus, and Nephrurus genera are unable to regrow lost tails. Of these non-regenerative geckos, the crested gecko (Correlophus ciliatus) is distinguished by ready availability, ease of care, high productivity, and hybridization potential. These features make C. ciliatus particularly suited as a model for studying the genetic, molecular, and cellular mechanisms underlying loss of tail regeneration capabilities. We report a contiguous genome of C. ciliatus with a total size of 1.65 Gb, a total of 152 scaffolds, L50 of 6, and N50 of 109 Mb. Repetitive content consists of 40.41% of the genome, and a total of 30,780 genes were annotated. Assembly of the crested gecko genome provides a valuable resource for future comparative genomic studies between non-regenerative and regenerative geckos and other squamate reptiles.
Keywords: 
Subject: Biology and Life Sciences  -   Animal Science, Veterinary Science and Zoology

Introduction

The crested gecko, Correlophus ciliatus, is a lizard species endemic to New Caledonia distinguished by eye and head projections/spines (Figure 1 A, B) and the inability to regenerate amputated tails (Figure 1 C, D). The chromosomes of C. ciliatus are typical of most New Caledonian geckos and exhibit a biarmed, acrocentric 2n=38 karyomorph (Figure 1 E) [1,2]. Crested geckos readily adapt to captivity; as a nocturnal, omnivorous species from a mild, tropical climate, C. ciliatus thrives at “room temperatures” and does not require the expensive lighting or insect diets obligatory in the maintenance of many other lizard species. Since C. ciliatus is able to breed nearly year-round without seasonal simulations, this species is also one of the most straight-forward and productive to breed in captivity.
C. ciliatus is one of only fourteen described gecko species (over 1,850 total) that has lost the ability to regenerate amputated tails (Supplementary Material Table S1). Of these non-regenerative gecko species, only C. ciliatus is readily available within the American and European pet hobbies. Furthermore, C. ciliatus is the only non-regenerative lizard species capable of hybridizing with regenerative relatives, specifically C. sarasinorum, Mniarogekko chahoua, and Rhacodactylus auriculatus. Currently, all other gecko species with sequenced genomes are capable of tail regeneration (personal observation by T. P. L.) [3,4,5,6,7]. The goal of studying a non-regenerative gecko towards identifying gene regions involved in tail regrowth is a main driver for sequencing the C. ciliatus genome. With its ease of care, high productivity, and options for hybridizations, the crested gecko is the ideal model lizard for studying the genetic mechanisms involved in loss of tail regeneration capabilities.

Methods

Sample Collection, PacBio Sequencing and Assembly

Gecko housing, handling, and sample collections were performed according to the guidelines of the Institutional Animal Care and Use Committee at the University of Southern California (protocol 20992). Genomic DNA was obtained from a single whole female C. ciliatus embryo collected from a two-month-old egg incubated at 23°C. The Qiagen Midi Prep Kit was used for the DNA extraction from 94mg of ground embryo, and approximately 100 ug of high molecular weight (HMW) DNA was obtained. Genomic DNA was sequenced using the PacBio Sequel II platform (Table 1). 185.8 gigabase-pairs of PacBio CCS reads were used as inputs to Hifiasm v0.15.4-r347 [8] with default parameters. To estimate the genome size of C. ciliatus, k-mer analysis was conducted on the PacBio CCS read using a range of k values (17, 19, 21, 23, 25, 27, 29 and 31). The estimated genome size was calculated by: (total number of kmers – erroneous kmers) divided by homozygous peak depth, following the methods of Cai et al [9]. (Note: Minimum coverage was defined as the depth of the first trough in the k-mer frequency distribution. K-mers that fell under this minimum coverage were considered erroneous.) Jellyfish v2.2.10 [10] was used to calculate the k-mer frequency using the -C parameter, and GenomeScope v1.0.0 [11] was then used to estimate heterozygosity.
BLAST v2.9.0 [12] results of the Hifiasm output assembly against the NCBI Nucleotide Database were used as inputs for blobtools v1.1.1 [13]. Scaffolds identified as possible contamination (sequences within genomic data that originate from sources other than the intended target organism, C. ciliatus) were removed from the assembly. Blobtools revealed contamination from Actinobacteria (2 contigs, 15 Mb) (Figure 2). Finally, purge_dups v1.2.5 [14] was used to remove haplotigs and contig overlaps.
Dovetail Omni-Libraies were constructed to scaffold initial Hifiasm assemblies. For each Dovetail Omni-C library, nuclear chromatin was fixed with formaldehyde and extracted. Following digestion with DNAse I, chromatin ends were repaired and ligated to biotinylated bridge adapters followed by proximity ligation of adapter-containing ends. After proximity ligation, crosslinks were reversed, and the DNA was purified. Purified DNA was treated to remove free biotin that was not incorporated into ligated DNA fragments. Sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. Libraries were sequenced on an Illumina HiSeqX platform to produce approximately 30x sequence coverage.
The draft de novo assembly produced by Hifiasm and Dovetail OmniC library reads were input into HiRise [15], a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies. Dovetail OmniC library sequences were aligned to the draft assembly using bwa v0.7.17 [16]. The separations (genomic distances between pairs of reads that map within draft scaffolds) of Dovetail OmniC read pairs that mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs. This model was used to identify and break putative misjoins (erroneous link between two contigs), to score prospective joins, and make joins above a threshold.

Repeat Content

De novo-based methods were used to identify transposable elements and other repetitive elements. Repetitive content within the C. ciliatus genome was predicted with RepeatModeler v2.0.1 [17]. Repetitive elements were masked using RepeatMasker v4.1.0 [18].

Gene Annotation and BUSCO Analysis

Coding sequences from Anolis carolinensis, Gekko japonicus, Pogona vitticeps, Salvator merianae, and Zootoca vivipara were used to train the initial ab initio model for C. ciliatus using the AUGUSTUS software v 2.5.5 [19]. Six rounds of prediction optimization were performed with AUGUSTUS. The same coding sequences were also used to train a separate ab initio model for C. ciliatus using SNAP v2006-07-28 [20]. Total RNA was extracted from a single whole, two-month-old, female C. ciliatus embryo using the QIAGEN RNeasy Plus Kit following manufacturer protocols. Total RNA was quantified using Qubit RNA Assay and TapeStation 4200. Prior to library prep, DNase treatment was performed, followed by AMPure bead clean up and QIAGEN FastSelect HMR rRNA depletion. Library preparation was done with the NEBNext Ultra II RNA Library Prep Kit following manufacturer protocols. Then these libraries were run on the NovaSeq6000 platform in 2 x 150 bp configuration (Table 2). RNA-Seq reads were mapped onto the genome using the STAR alignment software v2.7 [21], and intron hints were generated with the bam2hints tools within the AUGUSTUS software. MAKER v3.01.03 [22], SNAP and AUGUSTUS (with intron-exon boundary hints provided from bam2hints) were then used to predict gene identities in the repeat-masked reference genome. To help guide the prediction process, Swiss-Prot peptide sequences from the UniProt database [23] were downloaded and used in conjunction with the protein sequences from A. carolinensis, G. japonicus, P. vitticeps, S. merianae, Z. vivipara to generate peptide evidence in the MAKER pipeline. Only gene identities that were predicted by both SNAP and AUGUSTUS softwares were retained in final gene sets.

Results and Discussion

The total assembly size is 1.65 Gb, with a GC content of 45% (Table 3). The estimated genome size by k-mer analysis was 1.52 Gb (Table 3). It is worth noting the approximately two-fold difference between the coverage of the heterozygous peak at 50X, and the homozygous peak at 100X (Figure 3). This indicates a high level of heterozygosity in the genome. The rate of heterozygosity estimated by GenomeScope was approximately 0.51% (Table 4). The contig/scaffold N50 is 109 Mb, and the largest scaffold had a length 169 Mbp (Table 3). 99.54% (1653 Mbp) of the total assembly was scaffolded into 19 chromosome length scaffolds (Figure 4). This number of chromosomal scaffolds is consistent with the number of haploid chromosomes observed in the C. ciliatus karyotype (Figure 1 E).
The repetitive content consisted of 40.41% of the C. ciliatus genome, with a total length of 663.95 Mbp (Table 5). DNA transposons consisted of 1.39%, while LINE, SINE, and LTR transposons consisted of 14.75%, 6.42%, and 1.08% of the genome respectively. The de novo gene prediction resulted in a total of 30,780 protein coding genes (Table 6), and of the identified genes, 20,429 (66.37%) had an AED score ≤ 0.5 (Supplementary Material Figure S1).

Data Validation and Quality Control

BUSCO v5.7.1 [24] was used to assess the quality and completeness of the C. ciliatus genome. BUSCO analysis was performed using the eukaryota_odb10 dataset. The C. ciliatus genome captured 99.6% of BUSCOs in the eukaryota_od10 dataset (Table 7), indicating the high completeness of the assembly.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org.

Author Contributions

TL conceived and supervised the project and provided the crested gecko samples. MG analyzed genome assembly performed repeat annotation and BUSCO analysis. MG drafted the manuscript. TL revised the manuscript. ZP maintained animal colonies. All authors read and approved the final manuscript. All authors read and approved the final manuscript.

Availability of supporting data

Supporting datasets, including annotation are available at GigaDB. Raw sequencing reads and OmniC Library reads have been deposited in the SRA (Sequence Read Archive) database under Bioproject ID PRJNA1091669. RNA-Seq reads have been deposited under BioProject ID PRJNA1128839. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JBBPXQ000000000 and Biosample accession number is SAMN40604022. The version described in this paper is version JBBPXQ010000000.

Acknowledgements

We would like to acknowledge funding from NIH R01GM115444 and support from the CIRM COMPASS Award (EDUC5-13853). Special thanks to Dr. Andrew McMahon, the Department of Stem Cell Biology and Regenerative Medicine, and the Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research at University of Southern California for genome sequencing support.

Conflicts of Interest

The authors declare that they have no competing interests.

List of Abbreviations

BUSCO: Benchmarking Universal Single-Copy Orthologue, RNA-Seq: RNA Sequencing, BLASTN: Basic Local Alignment Search Tool (for nucleotides), bp: base pairs, Mbp: Mega base pairs, TE: transposable element; hmwDNA: high molecular weight DNA, AED: Annotation Edit Distance

References

  1. King, M. Chromosomal Evolution in the Diplodactylinae (Gekkonidae, Reptilia) .1. Evolutionary Relationships and Patterns of Change. Australian Journal of Zoology 1987, 35, 507–531. [Google Scholar] [CrossRef]
  2. Mengden, M.K.a.G. Chromosomal Evolution in the Diplodactylinae (Gekkonidae, Reptilia) .2. Chromosomal Variability Between New Caledonian Species. Australian Journal of Zoology 1990, 38, 219–226. [Google Scholar]
  3. Xiong, Z.; et al. Draft genome of the leopard gecko, Eublepharis macularius. Gigascience 2016, 5, 47. [Google Scholar] [CrossRef]
  4. Liu, Y.; et al. Gekko japonicus genome reveals evolution of adhesive toe pads and tail regeneration. Nat Commun 2015, 6, 10033. [Google Scholar] [CrossRef]
  5. Hara, Y.; et al. Madagascar ground gecko genome analysis characterizes asymmetric fates of duplicated genes. BMC Biol 2018, 16, 40. [Google Scholar] [CrossRef]
  6. Pinto, B.J.; et al. Chromosome-Level Genome Assembly Reveals Dynamic Sex Chromosomes in Neotropical Leaf-Litter Geckos (Sphaerodactylidae: Sphaerodactylus). J Hered 2022, 113, 272–287. [Google Scholar] [CrossRef]
  7. Pinto, B.J.; et al. The revised reference genome of the leopard gecko (Eublepharis macularius) provides insight into the considerations of genome phasing and assembly. J Hered 2023, 114, 513–520. [Google Scholar] [CrossRef]
  8. Cheng, H.; Concepcion, G.T.; Feng, X.; et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 2021, 18, 170–175. [Google Scholar] [CrossRef]
  9. Cai, H.; Li, Q.; Fang, X.; Li, J.; Curtis, N.E.; Altenburger, A.; et al. A draft genome assembly of thesolar-powered sea slug Elysia chlorotica. Sci. Data 2019, 6, 190022. [Google Scholar] [CrossRef]
  10. Marcais, G.; Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 2011, 27, 764–770. [Google Scholar] [CrossRef]
  11. Vurture, G.W.; Sedlazeck, F.J.; Nattestad, M.; Underwood, C.J.; Fang, H.; Gurtowski, J.; et al. GenomeScope:fast reference-free genome profiling from short reads. Bioinformatics 2017, 33, 2202–2204. [Google Scholar] [CrossRef] [PubMed]
  12. Mount, D.W. Using the basic local alignment search tool (BLAST). CSH Protoc 2007, pdb.top17. [Google Scholar] [CrossRef]
  13. Laetsch, D.R.; Blaxter, M.L. BlobTools: Interrogation of genome assemblies. F1000Research 2017, 6, 1287. [Google Scholar] [CrossRef]
  14. Guan, D.; Guan, D.; McCarthy, S.A.; et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 2020, 36, 2896–2898. [Google Scholar] [CrossRef]
  15. Putnam, N.H.; O’Connell, B.L.; Stites, J.C.; et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome research 2016, 26, 342–350. [Google Scholar] [CrossRef]
  16. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 2013, arXiv:13033997. [Google Scholar]
  17. Smit, A.; Hubley, R.; Green, P. RepeatModeler Open-1.0. 2008–2015. Available online: http://www.repeatmasker.org.
  18. Smit, A.F.A.; Hubley, R.; Green, P. RepeatMasker Open-4.0. 2013–2015. Available online: http://www.repeatmasker.org.
  19. Stanke, M.; Diekhans, M.; Baertsch, R.; Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 2008. [CrossRef]
  20. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 2004. [CrossRef]
  21. Dobin, A.; Davis, C.A.; Schlesinger, F.; et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef]
  22. Cantarel, B.L.; Korf, I.; Robb, S.M.C.; et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008, 18, 188–196. [Google Scholar] [CrossRef]
  23. The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucl. Acids Res 2019. [Google Scholar] [CrossRef]
  24. Manni, M.; Berkeley, M.R.; Seppey, M.; et al. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 2021, 38, 4647–4654. [Google Scholar] [CrossRef] [PubMed]
Figure 1. (A) Example of a crested gecko (Correlophus ciliatus). (B) C. ciliatus head detail showing head crest spines (white arrow heads) and “eyelash” spines (black arrow heads). (C, D) Representative gross anatomy images of C. ciliatus and C. sarasinorum tails 26 days post-amputation. C. ciliatus tails do not regenerate like those of related gecko species, including C. sarasinorum. Bar = 0.25 cm. (E) C. ciliatus karyotype (2n=38). Karyotype was prepared from C. ciliatus embryonic fibroblasts by the Molecular Cytogenetics Laboratory, Department of Veterinary Integrated Biosciences, Texas A&M University.
Figure 1. (A) Example of a crested gecko (Correlophus ciliatus). (B) C. ciliatus head detail showing head crest spines (white arrow heads) and “eyelash” spines (black arrow heads). (C, D) Representative gross anatomy images of C. ciliatus and C. sarasinorum tails 26 days post-amputation. C. ciliatus tails do not regenerate like those of related gecko species, including C. sarasinorum. Bar = 0.25 cm. (E) C. ciliatus karyotype (2n=38). Karyotype was prepared from C. ciliatus embryonic fibroblasts by the Molecular Cytogenetics Laboratory, Department of Veterinary Integrated Biosciences, Texas A&M University.
Preprints 119518 g001
Figure 2. Blobtools plot showing taxonomic categories for different scaffolds (blue: Chordata, grey: ‘no hits’, orange: Actinobacteria), scaffold wide coverage, and GC content. Scaffolds under Actinobacteria were removed from the final genome assembly.
Figure 2. Blobtools plot showing taxonomic categories for different scaffolds (blue: Chordata, grey: ‘no hits’, orange: Actinobacteria), scaffold wide coverage, and GC content. Scaffolds under Actinobacteria were removed from the final genome assembly.
Preprints 119518 g002
Figure 3. 21-mer frequency distribution of the C. ciliatus genome. The first peak at 53X is the heterozygous peak, while the second peak at 107X is the homozygous peak.
Figure 3. 21-mer frequency distribution of the C. ciliatus genome. The first peak at 53X is the heterozygous peak, while the second peak at 107X is the homozygous peak.
Preprints 119518 g003
Figure 4. Hi-C contact plot of 19 chromosomal scaffolds, along with unassigned scaffolds.
Figure 4. Hi-C contact plot of 19 chromosomal scaffolds, along with unassigned scaffolds.
Preprints 119518 g004
Table 1. General statistics of the raw sequencing reads used for C. ciliatus assembly.
Table 1. General statistics of the raw sequencing reads used for C. ciliatus assembly.
Sample Library Type Sequencing Type Platform Number of Reads Coverage
PacBio CCS Long Reads WGS PacBio Sequel II 13,182,509 62x
Table 2. Summary of RNA-Seq samples statistics.
Table 2. Summary of RNA-Seq samples statistics.
Sample ID Library Type Sequencing Type Platform Read length (bp)
geckoembryo1 Short Reads RNA Sequencing Illumina Novaseq 6000 (PE) 150
geckoembryo2 Short Reads RNA Sequencing Illumina Novaseq 6000 (PE) 150
geckoembryo3 Short Reads RNA Sequencing Illumina Novaseq 6000 (PE) 150
geckoembryo4 Short Reads RNA Sequencing Illumina Novaseq 6000 (PE) 150
geckoembryo5 Short Reads RNA Sequencing Illumina Novaseq 6000 (PE) 150
geckoembryo6 Short Reads RNA Sequencing Illumina Novaseq 6000 (PE) 150
Table 3. C. ciliatus genome assembly statistics.
Table 3. C. ciliatus genome assembly statistics.
Contig Scaffold
Largest scaffold length (bp) 169,053,634 169,053,634
N90 (bp) 26,447,670 51,189,016
N50 (bp) 109,210,969 109,210,969
L90 6 19
L50 6 14
Number > 1kbp 163 152
Number of N’s per 100 kbp 0.00 0.07
GC Content (%) 45 45
Total Size (bp) 1,653,058,530 1,653,059,630
Table 4. Estimation of genome size and heterozygosity of C. ciliatus assembly.
Table 4. Estimation of genome size and heterozygosity of C. ciliatus assembly.
k Total number of k-mers Minimum coverage (X) Number of erroneous k-mers Homozygous peak Heterozygous peak Estimated genome size (Gb) Estimated heterozygosity (%)
17 160830997409 23 1309311311 110 55 1.45 0.46
19 163207828191 21 2876102102 108 53 1.48 0.51
21 164978566184 21 3624290072 107 53 1.51 0.52
23 166415784027 21 4147151401 107 53 1.52 0.52
25 167625136836 21 4629779608 107 52 1.52 0.52
27 168655291501 21 5100636735 106 52 1.54 0.51
29 169546294955 21 5566480734 106 52 1.55 0.50
31 170331212990 21 6029874685 106 52 1.55 0.50
Table 5. Summary of Repetitive Content and Transposable Elements of C. ciliatus genome assembly.
Table 5. Summary of Repetitive Content and Transposable Elements of C. ciliatus genome assembly.
Type Number of Elements Length (bp) % of genome
DNA 173526 23044151 1.39
LINE 789302 243834212 14.75
SINE 652803 106143112 6.42
LTR 58109 17890933 1.08
Other 889 381767 0.02
Unknown 1902327 272947662 16.51
Small RNA 2588 3815594 0.23
Satellites 1039 952457 0.06
Simple repeats 292097 14087986 0.85
Low complexity 25303 1434776 0.09
Total 3897983 684532650 40.41
Table 6. De Novo Gene Prediction Metrics.
Table 6. De Novo Gene Prediction Metrics.
Total Number of Genes 30,780
Total coding region (bp) 39,760,070
Average length of genes (bp) 1,291.75
Number of single-exon genes 1,461
Table 7. BUSCO analysis summary of C. ciliatus genome.
Table 7. BUSCO analysis summary of C. ciliatus genome.
BUSCO Category Value Percent (%)
Complete BUSCOs 254 99.6
Complete and single-copy BUSCOs 249 97.6
Complete and duplicated BUSCOs 5 2
Fragmented BUSCOs 1 0.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated