1. Introduction
The most important aim of the modern agriculture is to meet the food and raw material demand of the growing human population. However, yield volumes of cultivated plants depend on numerous factors [
1,
2]. Fungal diseases of plants are often the primary cause of the crop losses [
3]. Thus, the susceptibility to various diseases can become a threat to the harvest and economic profits. The
Colletotrichum genus is pathogenic to different plant species and often demonstrates severe virulence [
4,
5,
6].
Colletotrichum lini is the causative agent of flax anthracnose [
7]. The pathogen can reside in untreated flax seeds and starts the infection process in seedlings and mature plants [
7]. The mature plant infected with
C. lini shows stem canker and leaf spotting. For the plant seedlings, the infection can become fatal [
8]. Thus, the production of two main products – flax oil and fiber – can be affected by harvest failure. In light of this fact, multifaceted studies on the fungus are highly important for anthracnose management.
Pathogenic
Colletotrichum species have three main lifestyles [
9] – biotrophic (hemibiotrophic), necrotrophic, and quiescent lifestyles. The
Colletotrichum genus generally lacks true biotrophic species [
9]. Its representatives are usually considered hemibiotrophic, as they establish the necrotrophic stage after the biotrophic one [
10]. Biotrophic fungi suppress the defense mechanisms of the host and mask the hyphae to obtain nutrients from the host [
11]. This stage can be indispensable for further establishment of infection and the death of host cells [
10]. Necrotrophic stage implies the secretion of fungal toxins and enzymes to absorb nutrients from the dead host cells [
11]. Quiescent stage is the period of dormancy of a fungus until a signal from the surrounding media is detected [
9]. Then, the fungus can complete its disease cycle [
9]. Along with pathogenic species, endophytic
Colletotrichum strains also occur [
12,
13]. Endophytes live in plant tissues and receive the nutrition from a plant without causing disease symptoms. Both
Colletotrichum endophytes and pathogens produce metabolites with useful bioactivities [
14,
15].
Colletotrichum representatives are attributed to species complexes according to intra-specific and interspecific differences in phenotype and genotype [
11]. To discriminate between fungal species, genetic barcodes were applied, e.g., GAPDH, HIS3, APN2, MAT1-2-1, GAP2-IGS, ACT, CHS-1, nrITS, TUB2 [
16]. However, there is no single universal barcode for all
Colletotrichum species, as they demonstrate different efficiency in various species complexes [
17,
18]. Thus, an ITS-based approach coupled with molecular characterization was applied for
Colletotrichum isolates from strawberry tissues. However, the authors observed no association with geographic origin, presence of symptoms, plant species or parts. In addition, the ITS marker failed to provide enough resolution for differentiation between
C. gloeosporioides isolates [
18]. Therefore, multi-locus analysis can be used for reliable results [
16]. Liu et al. constructed a genome tree for 94
Colletotrichum species. The analysis of 1893 single-copy orthologs allowed to allocate the taken species in a range of species complexes [
19].
Nevertheless, most comprehensive information can be extracted from full genomic sequences of the
Colletotrichum species [
20]. Comparative genomic analysis assists in studying the origins of pathogenicity and virulence [
21]. Thus,
Colletotrichum species possess a suite of potential pathogenicity genes, including effectors and CAZymes [
22]. The comparison between the gene repertoire of fungal species can shed light on the difference in pathogenicity degrees of fungal isolates. Meanwhile, horizontal gene transfer events can play an important role in the evolution of pathogenicity. For instance, in
C. musae, the analysis of mini-chromosome sequences revealed a set of genes which can undergo horizontal gene transfer [
23].
In this study, we obtained the annotated genomes of three C. lini strains of different virulence. The comparative analysis between the obtained assemblies revealed a difference in effector gene content, a chromosome rearrangement, and the absence of a possible pathogenicity chromosome in the genome of the moderately virulent strain. The obtained data is a valuable source of information on the pathogenicity determinants of the flax anthracnose pathogen. Further in-depth research on C. lini genomes will suggest possible solutions to breeding anthracnose-resistant flax varieties.
4. Discussion
Colletotrichum species are widely distributed plant pathogens which cause significant economic losses. The representatives of the genus are actively studied, including at the level of complete genomes. At the time of writing the manuscript, 270 assemblies of
Colletotrichum species were deposited in the NCBI Genome database (the size of the genomes is about 50-60 Mb). The advances in long-read sequencing technologies allowed obtaining high-quality genome assemblies of
Colletotrichum species [
53,
54]. Thus, high-quality genomes became the basis for further molecular genetic studies. Genomics and transcriptomics of
Colletotrichum species provided valuable information on the genes regulating their life cycle and the ability to produce proteins and secondary metabolites damaging plant cells [
55,
56,
57,
58,
59,
60,
61]. To identify molecular genetic factors that determine pathogenicity, special attention is paid to research on the interaction of
Colletotrichum species and their hosts [
62,
63,
64]. Using high-quality genome assemblies of
Colletotrichum species, a range of pathogenicity-associated genome regions were identified, including rapidly evolving regions in telomeres, repeat-rich pathogenicity minichromosomes, clusters of effector genes, and a number of genes co-expressing upon infection of a host [
50,
65,
66,
67].
The causative agent of flax anthracnose,
C. lini (syn.
C. linicola) has been unfairly deprived of attention in molecular genetic studies. The species was only studied using DNA markers [
25,
68,
69,
70]. In this study, we sequenced the genomes of three
C. lini strains of different virulence and conducted a comparative analysis of the obtained fungal genomes to reveal pathogenicity-associated factors. To exclude the contribution of multiple factors in further genomic analysis, we studied the strains with close morphological characteristics. The strains represented three degrees of virulence – low, medium, and high.
The combination of long ONT data and short precision Illumina reads allows obtaining high-quality genomes of the fungal pathogens [
71,
72,
73]. In this study, we obtained from ~140× to 170× genome coverage with raw ONT reads (assuming the genome size is ~50 Mb), having an N50 from ~5 to 12 kb. Coverage with Illumina data ranged from ~60× to 150×. To construct the most contiguous and complete assemblies, two approaches were tested on the highly virulent strain #390-1. The first one was based on constructing a draft assembly from long reads and polishing it with both long and short precision reads. The second approach implied the use of hybrid assembly software, taking both ONT and Illumina data as input. We used recently developed tools and software that demonstrated optimal results in our previous studies [
24,
25]. The most contiguous assembly was obtained using Canu – 31 contigs, N50 = 5.2 Mb, L50 = 5. However, its completeness (92.1%) was lower than for the assemblies obtained with hybrid tools. Thus, Masurca and Unicycler assembled genomes with a BUSCO completeness of 95.9%. Since polishing can increase the parameter, the draft assembly by Canu was considered optimal. According to the scheme that showed the best results in our previous studies, the chosen draft assembly was polished using Racon ×2 – Medaka (ONT reads) and Polca (Illumina reads) [
24,
25]. Thus, the BUSCO completeness of the assembly rose from 92.1 to 96.7%. The final value was higher than that achieved by Masurca and Unicycler. Thus, Canu – Racon ×2 – Medaka – Polca allowed us to assemble a contiguous and complete genome. The scheme was employed to assemble the genomes of strains #757 and #771. The final genomes consisted of 26-32 contigs, had N50 values in the megabase range (5.2-5.8 Mb), and were more than 96% complete.
Thus, the obtained genomes had high contiguity. After the search for telomeric repeats and their visualization (
Supplementary Figures S1, S2, and S3), we observed peaks at one or both ends of the obtained contigs. This indicated that the assembled contigs were possibly big parts or complete chromosomes. At the time of writing the manuscript, two chromosome-level assemblies were available in the NCBI database (
Colletotrichum higginsianum IMI 349063 GCA_001672515.1,
Colletotrichum graminicola GCA_029226625.1). The Contig N50 values of these two assemblies are 5.2 and 5 Mb, respectively. The L50 values for both assemblies are 5. In this study, we constructed contig-level assemblies for
C. lini. However, the analysis of telomeric repeats suggested the presence of complete chromosomes. Thus, high coverage with long ONT reads probably allowed assembling the sequences of complete chromosomes. Besides, the contiguity of the obtained assemblies is comparable to that of the chromosome-level assemblies prior to anchoring to chromosomes.
The assemblies were annotated using Funannotate. The resulting annotations had close numbers of predicted gene models. The highly virulent strain had the highest number of gene models (#390-1) – 12891, and the moderately virulent strain (#757) had the lowest number of gene models – 12520. This correlated with the BUSCO completeness of the strains assemblies. Strain #757 had lower completeness than strain #390-1. Although the completeness of an assembly impacts the accuracy of gene prediction, the highest number of gene models in the genome of the highly virulent strain can still correlate with its high pathogenicity. In
C. graminicola, ~15000 genes were predicted [
74]. Thus, the number of predicted gene models for
C. lini was in the order of the values from the literature data. To conduct a primary analysis of virulence genes, we searched for the encoded effector proteins in the obtained genome assemblies. Effector proteins are the small cysteine-rich proteins influencing plant cellular processes to facilitate the infection process [
75]. The lowly virulent strain #771 contained the lowest number of proteins with signal sequences, i.e., potentially secreting, and the lowest number of uniquely annotated effectors (according to InterProScan. The highly virulent strain had 37.4% effectors from the potentially secreting proteins. Meanwhile, the moderately virulent strain #757 had the lowest number of effectors – 36.2%. The percentages correlated with the total numbers of the predicted gene models. Higher percentages of effector proteins are likely related to higher pathogenicity. However, the obtained results can still be prone to fluctuations in predicted values. Besides, the detection of an effector protein can also trigger plant immunity mechanisms [
76]. Therefore, further research is needed to collect more information on the effector proteins of the studied fungi and elucidate true virulence mechanisms. In the genome of the lowly virulent strain, the smallest number of uniquely annotated effectors can correlate with its low virulence. In a similar study, the strain of
C. scovillei was characterized by defective growth and virulence, along with a reduced number of effectors [
77].
To reveal the possible effect of genome rearrangements, we performed whole-genome alignment of the three
C. lini genomes with each other. In the assembly of the lowly virulent strain #771, scaffold 6 contained one big inversion. Such genome rearrangements might be crucial for the function of certain genomic regions. Small scaffold 12 (0.7 Mb) in the lowly virulent strain #771 aligned to scaffold 13 in the highly virulent strain #390-1. However, this sequence was completely missing in the genome of the moderately virulent strain #757. Since this scaffold had an increased occurrence of repeats at its ends, we assumed that it could be a small pathogenicity-associated chromosome [
52]. Besides, BLAST analysis of the annotated proteins from the scaffold showed that it contained helicases, peptidases, and hydrolases. Therefore, the minichromosome can be implicated in replication events, the growth of the fungus, and necrotrophy.
In this work, using ONT and Illumina data, we obtained the first three high-quality C. lini genomes. We performed primary comparative analysis of the obtained assemblies. The difference in the number of effector proteins and the presence of a possible pathogenicity minichromosome suggested possible determinants of the high virulence. The assembled whole high-quality genome sequences created the foundation for further in-depth search for molecular determinants of pathogenicity both at the chromosome and gene levels. Such data is indispensable for the advancement of disease management techniques and conceiving new strategies for breeding resistant varieties. Moreover, the obtained high-quality genomes of C. lini expand the knowledge of the genetic diversity of the genus Colletotrichum.
Author Contributions
Conceptualization, N.V.M. and A.A.D.; Investigation, E.M.D., E.A.S., T.D.M., T.A.R., L.P.K., R.O.N., A.A.T., D.A.Z., E.V.B., E.N.P., N.V.M., and A.A.D.; Writing, E.M.D., E.A.S., N.V.M., and A.A.D. All authors have read and agreed to the published version of the manuscript.