To generate the basis for studying the origins and mechanisms behind the
L. theobroma survival strategies and pathogenicity mechanisms,
L. theobromae strain LTTK16-3 isolated from Chinese hickory tree (cultivar of linan) in Linan, Zhejiang province of China (Zhuang et al., 2021), was used for genome sequencing.
L. theobromae has an estimated genome size of 42.52 Mb based on 21 K-mer analysis and the K-mer distributions followed a Poisson distribution with low heterozygosity (<0.5%) (
Figure 1A). As showed in
Table 1, approximately 6 Gb ONT reads were obtained (reads coverage 139×). The
de novo genome assemblies found that the assembly size of LTTK16-3 strain was 42.82 Mb, the total contig numbers was 10, contig N50 was 5.67 Mb and the maximum contig length was 7.93 Mb (
Table 1). The completeness of genome assemblies assessed by BUSCO v5.12 (
https://busco.ezlab.org/) found the LTTK16-3 strain contain 98.71% complete orthologs at
Ascomycota level (n=1706) (
Figure 1B and
Table 1). The telomere repeats determined at start or end region of contigs (5’-TTAGGG-3’ / 5’-CCCTAA-3’) showed that the assembly of stain LTTK16-3 contained the 6 contigs with (TTAGGG)n start, 5 contigs with (CCCTAA)n end, and 3 contigs reached T2T chromosomal level (
Table 1). Repeats masked before gene prediction found that repeat content of LTTK16-3 strain was 3.37% (
Table 1,
Figure 1B). The gene prediction showed that a total of 12,516 protein-coding genes were identified in the repeat-masked genome assembly of LTTK16-3 stain (
Table 1). Additionally, as shown in
Figure 2, gene density ranged from 1 to 8 genes per 100 kilobases (kb) across the chromosomes and the GC contents level of the total genomes were 54.57% (Table1,
Figure 2). Intra-genomic syntenic analysis only detected 9 syntenic blocks containing 75 pairs of homoeologous genes in the genome of LTTK16-3 stain, which is consistent with the its relatively low genomic heterozygosity (
Figure 1A and
Figure 2). The gene functional annotation listed in
Table 1 showed that LTTK16-3 strain contained around 2,457 pathogen-host interaction genes, 237 carbohydrate active enzymes, 190 cytochrome P450 enzymes. Additionally, a total of 715 putative secreted proteins were identified using our previously defined pipeline (
Table 1) [
16] and 51 secondary metabolite biosynthetic genes (SMBGs) were identified using antiSMASH v5.2.0 (
https://fungismash.secondarymetabolites.org/) (
Table 1) [
17,
18].