Introduction
Elapidae is a family of snakes divided into three subfamilies (Bungarinae, Elapinae and Notechinae), with 44 genera and around 186 described species distributed widely [
1] The front of the mouth of an elapid has permanently erect tusks, which are his distinguishing features. Elapids include terrestrial and sea snakes. Terrestrial elapids, a family of venomous snakes, are distributed across the globe in tropical and subtropical regions, with most species inhabiting the Southern Hemisphere. Elapid sea snakes are mainly distributed in the Indian Ocean and the Southwest Pacific Ocean [
2]
The Chinese cobra, or
Naja atra (NCBI: txid8656) (
Figure 1), is a species of cobra from the family Elapidae. Chinese cobras are usually between 1.2 and 1.5 m long [
3] and they are among the most prevalent cobra species in China. The Chinese cobra likes to inhibit plains, hills and low mountains [
4] Humans often encounter Chinese cobras, although these snakes usually escape to avoid confrontation with humans. Chinese cobras can be observed hunting during daylight hours from March to October and up to 2-3 hours after sunset at temperatures of 20-32°C [
5] They have a widely varied diet and prey on rodents, frogs, toads and other snakes.
The Chinese cobra is highly poisonous, its venom consisting mainly of postsynaptic neurotoxins and cardiotoxins [
6] Their venom offers them protection from predation to a certain extent; however, populations of Chinese cobra have declined by 30% to 50% due to habitat loss and hunting by humans. The venom of Chinese cobras can be used to extract anti-cobra snake venom, which is used to treat cobra snake bites. Although the Chinese cobra is currently listed as a Vulnerable species on the International Union for Conservation of Nature Red List [
7] its numbers in the wild have declined from Vulnerable to Endangered due to continued hunting.
Main Content
Context
Snakebite is a serious threat to human life as it kills around 100,000 people annually. Genome-enabled research of toxin genes may facilitate the development of effective antivenoms. Here, we present a highly continuous reference genome assembly of
N. atra. While there is a reference genome for the Indian cobra (
Naja naja) [
20], this is the first for the Chinese cobra. This resource may also provide valuable information for the conservation of this vulnerable species, which can be used for targeted protection and breeding.
Methods
The detailed methods used in this study are available via a protocol collection hosted in protocols.io [
9] (
Figure 2).
Sample collection and sequencing
The N. atra sample used in this study was captured in Huangshan, Anhui, China, in 2021. After collection, the specimen was quickly frozen to -80°C using drikold dry ice for storage and transport. Methods for DNA extraction, library construction and sequencing were identical those used by Liu et al in a previous study.
Sample collection, experiments and research design were all authorized by the Institutional Review Board of BGI (BGI-IRB E22001). In this research, all the procedures have been operated abiding to the guidelines from BGI-IRB strictly.
Genome survey, assembly, annotation and assessment
The single-tube long fragment read sequencing data were assembled using Supernova (v2.1.1, RRID:SCR_016756) [
10]. NextPolish (v1.0.5) [
10] was then used to perform a second round of correction and a third round of polishing of this assembly using the Whole Genome Sequencing data. To get a haploid representation of the genome, duplicates were purged from the genome using the purge_dups pipeline (RRID:SCR_021173) [
11]. The completeness of the genome was evaluated using sets of BUSCO (v5.2.2, RRID:SCR_015008) with genome mode and lineage data from vertebrata_odb10 [
12].
In order to detect the presence of known repeat elements in the genome of the many-banded P. mucosa, the following approach was employed. To identify the known repetitive elements in the genome of the many-banded krait, we used Tandem repeats Finder [
13], LTR_Finder (RRID:SCR_015247) [
14] and RepeatModeler (v2.0.1, RRID:SCR_015027) [
15]. RepeatMasker (v3.3.0, RRID:SCR_012954) [
16] and RepeatProteinMask v3.3.0 [
17] were used to search the genome sequences for known repeat elements. The BRAKER2 pipeline (RRID:SCR_018964) [
18] was used for gene prediction. Then, the gene sets were aligned against several known databases, including SwissProt [
19], TrEMBL [
19], Kyoto encyclopedia of genes and genomes (KEGG) [
8], gene ontology (GO), and the Non-Redundant Protein Sequence Database [
8] database.
Results
We present a draft genome sequence of
N. atra. The size of this genome is 1.67 Gb (
Table 1), similar to the previously published 1.79 Gb genome of
N. naja [
8]. The scaffold N50 length is 234.17 Kb, and the CG content reached 37.8%. The maximal scaffold length is 2,929,773 bp, demonstrating that the reference is highly continuous according to the characteristics of the genome sequence. In addition, the integrity of the genome was assessed at 84.1% using BUSCO (
Figure 3).
In the
N. atra genome, the content of repetitive elements is up to 40.26%, and the total length of the genome is 672 Mb (
Table 2,
Table 3). After we counted all repeat elements, we found that long interspersed nuclear elements (LINEs) accounted for 30.63%, long terminal repeats (LTRs) accounted for 14.03% and DNA accounted for 4.27% (
Figure 4).
Finally, 29,063 functional genes were annotated. Through KEGG annotation, we found that the genes related to signal transduction are essential in
N. atra (
Figure 5). Furthermore, through a pathway enrichment analysis, we found that the number of Human Diseases pathways is the highest. Environmental Information Processing and Organismal systems also account for a relatively large proportion. According to the annotation and enrichment in the GO database, 6,292 genes are enriched in cellular process and 6,734 in binding.
Data Availability
The data supporting the findings of this study have been deposited into the CNGB Sequence Archive (or CNSA) of China National GeneBank DataBase (or CNGBdb) with the accession number CNP0004141. Raw reads are available in the SRA via bioproject PRJNA955401. Additional data is in the GigaDB repository [
21].
Editor’s note
This paper is part of a series of Data Release papers presenting the genomes of different snake species [
22].
Ethics approval and consent to participate
The authors declare that ethical approval was not required for this type of research.
Author Contributions
TL designed and initiated the project. JW performed DNA extraction, library construction and data analysis. JW wrote the manuscript. All authors read and approved the final manuscript.
Funding
Our project was financially supported by the Guangdong Provincial Key Laboratory of Genome Read and Write (grant no. 2017B030301011). This work was also supported by China National GeneBank (CNGB).
Acknowledgments
Yunnan Agricultural University collected the samples.
Conflicts of Interest
The authors declare no conflict of financial interests.
Abbreviations
GO, gene ontology; KEGG, Kyoto encyclopedia of genes and genomes; LINE, long interspersed nuclear element; LTR, long terminal repeat; SINE, short interspersed nuclear element; TE, transposable element.
References
- Slowinski JB, Keogh JS (April 2000). "Phylogenetic relationships of elapid snakes based on cytochrome b mtDNA sequences". Molecular Phylogenetics and Evolution. 15 (1): 157–64. [CrossRef]
- Chan, S. (2006). A field guide to the venomous land snakes of Hong Kong. Cosmos Books Ltd., Hong Kong. ISBN 988-211-326-5.
- Snake of medical importance. Singapore: Venom and toxins research group. 1990. ISBN 9971-62-217-3.
- "Naja atra – General Details, Taxonomy and Biology, Venom, Clinical Effects, Treatment, First Aid, Antivenoms". WCH Clinical Toxinology Resource. University of Adelaide. Retrieved 18 December 2011.
- Zhao, EM; Adler, K (1993). Herpetology of China. United States: Society for the Study of Amphibians and Reptiles. ISBN 0-916984-28-1.
- Wang, AH; Yang, CC (10 September 1981). "Crystallographic studies of snake venom proteins from Taiwan cobra (Naja nana atra). Cardiotoxin-analogue III and phospholipase A2". Journal of Biological Chemistry. 256 (17): 9279–9282. [CrossRef]
- "Naja atra". IUCN Red List of Threatened Species. 2014: e.T192109A2040894. [CrossRef]
- Suryamohan K, Krishnankutty SP, Guillory J, Jevit M, Schröder MS, Wu M, Kuriakose B, Mathew OK, Perumal RC, Koludarov I, Goldstein LD, Senger K, Dixon MD, Velayutham D, Vargas D, Chaudhuri S, Muraleedharan M, Goel R, Chen YJ, Ratan A, Liu P, Faherty B, de la Rosa G, Shibata H, Baca M, Sagolla M, Ziai J, Wright GA, Vucic D, Mohan S, Antony A, Stinson J, Kirkpatrick DS, Hannoush RN, Durinck S, Modrusan Z, Stawiski EW, Wiley K, Raudsepp T, Kini RM, Zachariah A, Seshagiri S. The Indian cobra reference genome and transcriptome enables comprehensive identification of venom toxins. Nat Genet. 2020 Jan;52(1):106-117. [CrossRef]
- Liu B, Cui L, Deng Z, Ma Y, Yang D, Gong Y, et al. Protocols for the assembly and annotation of snake genomes. 2023. Protocols.io. [CrossRef]
- Guan D, McCarthy SA, Wood J, Howe K, Wang Y and Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36 9:2896-8. [CrossRef]
- Wick RR, Holt KE. Benchmarking of long-read assemblers for prokaryote whole genome sequencing.F1000Research, 2019; 8: 2138. [CrossRef]
- Benson and G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research. 27 2:573-80. [CrossRef]
- Zhao X and Hao W. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research. 2007 35(suppl_2): W265–W268. [CrossRef]
- Smit A, Hubley R and Green P. RepeatModeler Open-1.0. 2008–2015. Seattle, USA: Institute for Systems Biology. Available from: https://www.repeatmasker org Last Accessed May. 2015;1:2018.
- Tarailo-Graovac M and Chen N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Current protocols in bioinformatics / editoral board, Andreas D Baxevanis [et al]. 2009;Chapter 4 Unit 4:Unit 4.10.
- Tempel S. Using and understanding RepeatMasker. Mobile Genetic Elements. Springer; 2012. p. 29-51. [CrossRef]
- Bruna T, Hoff KJ, Lomsadze A, Stanke M and Borodovsky M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics. 2021;3 1:lqaa108. [CrossRef]
- Amos B and Rolf A. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research. 2000; 1:45. [CrossRef]
- Pitk, E. KEGG database. Novartis Foundation Symposium. 2006;247:91-103.
- Kushal Suryamohan, Sajesh P. Krishnankutty, Joseph Guillory, Matthew Jevit, Markus S. Schröder, Meng Wu, Boney Kuriakose The Indian cobra reference genome and transcriptome enables comprehensive identification of venom toxins. [CrossRef]
- Cite GigaDB DOI when ready.
- Snake Genomes. GigaByte. 2023. [CrossRef]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).