1. Introduction
Tumour formation is a pathological process that results from the uncontrolled proliferation of a group of cells. Tumours occur in virtually all multicellular organisms and are represented by newly formed tissues whose cells are actively proliferating. In animals, a system of proto-oncogenes and tumour suppressor genes forms a complex network which systemically controls the rate of cell division, growth and differentiation at the level of the whole organism. Disruption of such control, both under the influence of environmental factors and due to genomic instability, leads to the development of tumour growth. Higher plants contain functional orthologues of many mammalian tumour suppressors and oncogenes, but mutations in these genes in plants have not led to tumour formation, suggesting a very different principle of organization of the systemic control of cell division and differentiation in plants [
1,
2,
3].
Most examples of plant tumours arise as a result of interactions with a variety of pathogens and phytophages, from bacteria and viruses to nematodes and arthropods [
4]. More rarely, spontaneous tumours develop in the plants with specific genotypes (mutants, interspecific hybrids, inbred lines) in the absence of pathogens, making them more similar to animal tumours [
1,
4]. The exact causes of spontaneous tumour formation have only been studied in
Arabidopsis mutants that have defects in cell-to-cell adhesion due to loss of function of enzymes involved in the biosynthesis or modification of cell wall components [
5,
6,
7,
8]. The study of these mutants has revealed one of the unknown aspects of the systemic control of cell division in plants, bringing cell adhesion to the fore [
1]. At the same time, not all tumor mutants of
Arabidopsis and other plants have impaired cell adhesion. Studying other examples of plant tumours could help identify other systemic regulators of cell division in higher plants.
The objects of our research are spontaneous tumours of the inbred lines of European cherry radish (
Raphanus sativus var.
radicula Pers.) (
Figure 1A). The genetic collection of radish inbred lines has been maintained at St. Petersburg State University (SPbSU) since the 1960s by selfing individual plants, and now contains thirty three highly inbred lines, originated from four radish cultivars. Eleven radish inbred lines stably form tumours on the taproots of plants at the flowering stage ([
9,
10],
Figure 1).
As with most examples of spontaneous tumours in plants, the mechanism triggering tumour formation in the radish inbred lines is unknown. Tumours on radish taproots originate from the pericycle and cambium as callus-like structure and later acquire features of secondary differentiation, such as vasculature, and meristematic foci similar to root apical meristems due to auxin maxima and
WOX5 expression [
11]. The RNA-seq of radish tumours compared to lateral roots revealed the differential expression of the more than 1600 genes [
12]. Most of the pathways upregulated in radish tumours were associated with the control of cell division, showing extreme activation of this process in the tumour tissue [
12].
In the present work, we have sequenced the genomes of two closely related radish inbred lines 18 and 19 that contrastingly differ in their ability to form tumours ([
9],
Figure 1). In genetic crosses between these two lines, this trait was inherited as a monogenic recessive [
13], providing an opportunity to identify a specific gene that regulates spontaneous tumour formation.
As a result, a number of SNVs (InDels and SNPs) was revealed in the tumour radish line. Among these, we found more than a hundred SNVs in the CDS of protein-coding genes that are thought to lead to changes in protein structure ("stop lost" / "stop gained" or a frameshift) or in the position 1-20 of 5’-UTR that could severely influence the translation efficiency [
14]. Many of the genes with such SNVs in the tumour-forming line are homologs of
Arabidopsis genes, which are involved in cell cycle regulation, cytoskeleton organisation, meristem development and phytohormone homeostasis. Among them, we selected 108 SNVs which are in the homozygous state in the tumour radish line. The presence of the selected InDels and SNPs in the radish tumour line was verified by sequencing the amplicons of the corresponding gene regions in the radish lines 18 and 19.
To search for the association of SNVs with spontaneous tumour formation, we performed the sequencing of 40 SNV-containing gene regions in seven tumour and fourteen non-tumour radish lines of the SPbSU genetic collection. As a result, we found that the RsERF018 gene contains the CAG insertion in the 5’-UTR close to start codon in most tumour radish lines and only two non-tumour lines, which allows us to propose it as a candidate regulator of spontaneous tumour formation.
Based on genome assemblies’ data of two radish inbred lines, we carried out the identification and chromosomal location of the genes belonging to the
CLE and
WOX families which are known to be master regulators of meristem identity and stem cell homeostasis. Among them, we identified new, previously uncharacterised radish
CLE genes which are likely to encode proteins with multiple CLE domains. Homologs of such a group of
CLEs are absent in
Arabidopsis, but have been identified in
Brassica napus [
15].
The sequencing of the genome of the tumour radish line may be a step towards identifying new mechanisms underlying the spontaneous tumour formation in higher plants.
2. Results
2.1. Assessment of the Assembly Quality of the Genomes of Two Radish Inbred Lines
To compare the genomic DNA sequences of tumour radish line 19 and non-tumour radish line 18, we performed a hybrid chromosome-level assembly using a combination of data obtained by Illumina and Oxford Nanopore sequencing methods.
As a result of the assembly quality assessment using the BUSCO programme (
https://busco.ezlab.org/), it was shown that the number of single copies of nuclear genes was greater than 92.2% for line 19 and 91,1% for line 18. The number of duplicated sequences was 6.4% for line 19 and 5.7% for line 18, and the overall assembly quality index was greater than 98.6 % for line 19 and 96,8 % for line 18, indicating a low content of fragmented or incomplete sequences and no contamination by sequences from other phylogenetic taxa (
Figure 2). The assembly parameter values obtained using the Quast programme indicated that the genome size of line 18 was 492,907,896 bp with N50 = 12750, and genome size of line 19 was 480,234,765 bp, and N50 = 13846043. These parameters are comparable to the characteristics of reference radish genomes [
16,
17,
18].
A BUSCO analysis of the genome assemblies of lines 18 and 19 has shown the quality indicators as 93.8 and 98.9, respectively. Thus, the assemblies quality of the radish lines genomes acquired in this work is not inferior to those available in the NCBI database (
https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=3725; available on 23.01.2024).
2.2. Identification of SNVs in the Protein-Coding Genes of Tumour Radish Line
When analysing the genome sequences of tumour and non-tumour radish lines, we have identified a large number of SNVs (514083 InDels and 2260270 SNPs) in tumour line 19 (
Table 1, Suppl. Fig.1). Among them, 35399 InDels and 688148 SNPs were located in the CDS of protein-coding genes or in position -1-20 of 5’-UTR. Of these, 21698 InDels and 9451 were likely to result in the altered translation of the corresponding proteins due to frameshift, loss of start or stop codon, gain of start codon, or decrease of the translation efficiency due to the changes in the 5’-UTR close to start [
14].
Among the genes with these SNVs, we selected 240 InDels and 135 SNPs in the genes related to GO probably associated with the control of plant cell proliferation: related to regulation of cell growth (GO:0008283, GO:0007346, GO:0010564, GO:0000278, GO:0051726, GO:0006261, GO:0042023, GO:0000910, GO:0000911, GO:0000226, GO:0009828, GO:0009505, GO:0009825), meristem activity (GO:0010014, GO:0010075, GO:0009933), phytohormone signaling (GO:0009736, GO:0009690, GO:0009686, GO:0045487, GO:0009734, GO:0009733, GO:0009735, GO:00097390, gene expression regulation (GO:0003700, GO:0006306, GO:0034968, GO:0051567), and organogenesis (GO:0048364, GO:0048527, GO:0090451).
Among the genes belonging to these GO pathways, 72 genes with InDels and 36 genes with SNPs were in the homozygous state in the radish line 19. Of these 72 InDels, 57 resulted in frameshift, 9 - in frameshift and loss of start codon, 5 - in frameshift and gain of stop codon, and 1 – in change of the 5’-UTR near start codon. For 36 SNPs, 23 resulted in stop codon gain, 10 - in stop codon loss, and 3 - in start codon loss. We performed the chromosomal location of genes with such SNVs (
Figure 3 and
Figure 4). More detailed information on these genes is presented in Supplementary Tables 1 and 2.
It can be assumed that the abovementioned SNVs could lead to loss of function of the corresponding protein-coding genes in the tumour radish line, and thus each of these SNVs could cause tumour formation. The effects of loss-of-function mutations in some of these genes on plant development have also been described for their homologs in Arabidopsis (Suppl. Tables 1, 2).
2.3. Search for the Presence of Identified SNVs in the Tumour and Non-Tumour Lines of the Radish Genetic Collection
To search for probable candidate regulators of spontaneous tumour formation among genes containing selected SNVs in the line 19, we amplified the corresponding gene regions of several other tumour (12, 13, 14, 16, 20, 21, 32) and non-tumour (3, 5, 6, 8, 9, 23, 25, 26, 27, 28, 29, 30, 37, 39) lines of the radish genetic collection.
As a result, the presence of the same SNV in most tumour lines was confirmed for the
RsERF018 gene (
Figure 4). As for the other 39 genes, in some of them SNVs were only identified in line 19, or there was a polymorphism that was not associated with the tumour formation trait.
The
RsERF018 gene, whose homolog in
Arabidopsis controls response to ethylene and cambium cell division [
19], contains a CAG insertion just upstream of the start codon of the gene in the tumour lines 12, 13, 14, 19, 20 and 21, and also in non-tumour line 26 and 27, whereas no insertion was detected in the tumour lines 16, 32 and in most non-tumour lines (
Figure 4). According to data obtained in
Arabidopsis, this kind of changes in the positions -1-20 of the 5’-UTR dramatically decreases the efficiency of translation [
20].
The RsERF018 gene needs to be further investigated as a possible regulator of spontaneous tumour formation.
2.4. Identification and Chromosomal Localisation of WOX and CLE Genes in the Obtained Genome Assemblies of Inbred Radish Lines
Meristem regulators are known to be involved in the control of the plant cell division plan, and they was shown to participate in the development of numerous examples of plant tumours [
4]. The balance between cell division and differentiation in various plant meristems is under control of WOX-CLAVATA system, a highly conserved regulatory module [
21], which consists of CLAVATA3/EMBRYO SURROUNDING REGION-related (CLE) peptides, the protein kinase receptors that bind CLEs, and the targets of CLE action, the WUSCHEL-RELATED HOMEOBOX (WOX) homeodomain transcription factors [
22,
23,
24].
We carried out the identification of the radish
CLE and
WOX family genes in our genome assemblies of 18 and 19 radish lines (
Figure 5, Suppl. Fig. 2, 3). A total of 52
RsCLE genes and 24
RsWOX genes were found. All 24
RsWOX genes were previously identified [
25]. Among
RsCLE genes, 16
RsCLEs were identified in our previous work [
26], and other
RsCLE genes were annotated in the reference radish genome [
16]. The chromosomal location of
RsWOX and
RsCLE genes (
Figure 5) revealed the clusters of closely located
RsCLEs on 2, 4 and 9 radish chromosomes.
It is important to note that the genes RsWOX2, RsWOX14 and RsCLE7 were among those in which SNVs in the CDS, which probably leads to loss of function, were identified in tumour radish line 19 (Suppl. Figure 6, Suppl. Tables 1, 2). At the same time, these SNVs were only confirmed in tumour line 19 and not in other radish tumour lines.
2.4. Identification of Radish CLE Genes Likely to Encode Proteins with Multiple CLE Domains
Among all the
RsCLE genes identified in this work (
Figure 7), we have found two unique
RsCLEs of unknown function, which probably encode proteins with multiple CLE domains. We then found the same genes in the radish reference genome, where they had not been described as
CLE genes and named in the NCBI database as actin-binding protein wsp1-like (LOC108807713) and proline-rich receptor-like protein kinase PERK10 (LOC108858878). We have uploaded the sequences of these genes found in our assemblies to the NCBI database (Submission ID: 2791313, GenBank numbers PP236904.1 and PP236905.1) under the names
RsCLEm1 and
RsCLEm2 (“
RsCLE multidomain”).
Each of
RsCLEm genes contains eight tandem CLE domain sequences separated by short spacers (
Figure 7). The
CLE genes encoding multidomain CLE proteins were previously identified and functionally studied in
Brassica napus [
15], but were absent in
Arabidopsis.
3. Discussion
To date, radish genome sequencing has previously been carried out for several Asian and European cultivars and isolates [
16,
17,
27,
28,
29,
30]. The Rs1.0 genome, which is a radish reference genome, was based on the chromosome sequences of
R. sativus of the Korean cultivar WK10039 [
16].
In our work, we sequenced the genomes of two closely related radish inbred lines which differ in their ability to form spontaneous tumours [
9,
10,
11,
12,
13]. This is the first attempt to sequence the genome of plants with spontaneous tumours formation.
To date, the most well-studied examples of spontaneous tumours in higher plants are several monogenic mutants of
Arabidopsis [
5,
31,
32,
33] and one of
Nicotiana tabacum [
34], which form tumours on different organs of seedlings. In most cases, tumours in these mutants are the result of loss of function of pectin metabolism genes, which are involved in cell wall formation and cell adhesion [
5,
6,
7,
8]. The discovery of such mutants showed that cell adhesion is one of the mechanisms that systemically regulate cell proliferation in the plant body. However, cell adhesion is not the only mechanism of such systemic regulation. In
Arabidopsis there are also tumour-forming mutants with loss of function of the other the genes whose association with tumour development is much less obvious, such as the gene encoding the immunophilin family protein [
31], the tyrosine phosphatase-like protein [
35], and the chromatin remodelling factor [
36]. Thus, the identification of plant genes whose loss of function leads to spontaneous tumour formation will help to identify new systemic mechanisms for cell division control in higher plants.
In our work, we have identified numerous SNVs, including those in the CDS or position -1-20 of the 5’UTR of protein-coding genes, that distinguish the tumour radish line from the relative non-tumour line. Therefore, we can assume that certain SNVs could be inducers of spontaneous tumour formation. According to data on transcriptome analysis of the roots and spontaneous tumours in the radish inbred line, all 108 genes with loss-of-function SNVs in the tumour line 19, were expressed in radish taproots [
12]. Moreover, five genes with such SNVs identified in this study were among the DEGs: the expression levels of the cell cycle regulator
RsPCNA1 and the gene of unknown function
LOC108817684 were increased in the tumours, whereas the expression levels of the radish homologs of the auxin response gene
RsSAUR32, the ethylene response cambium-associated genes
RsERF018 and
RsERF019, and also the
RsLRR-RK gene encoding receptor-like protein kinase were decreased [
12].
Due to the large number of SNVs identified, it is currently not possible to make clear assumptions about the role of each SNV in spontaneous tumour formation. Additional testing for the presence of the identified SNVs in tumour and non-tumour radish lines revealed that a CAG insertion at position -1 of 5’-UTR of the
RsERF018 gene was present in the seven out of eight tumour radish lines tested and absent in the thirteen out of fifteen non-tumour lines. Without the insertion, this region contained an AAA sequence just before the start codon, which should result in high translation efficiency [
20]. Therefore, an insertion of a CAG between the start and the AAA region (
Figure 4) should result in a considerable decrease in the amount of the translated protein, as had been shown in
Arabidopsis [
20].
In this work, we also characterised and chromosomally localised gene genes of the WOX and CLE families in the genomic sequences of radish lines from the SPbSU genetic collection. Among the RsWOX and RsCLE genes, the loss-of-function SNVs were detected in the RsWOX14, RsWOX2 and RsCLE7 genes in line 19 (Suppl. Figure 6).
In
Arabidopsis, the
WOX14 gene is a regulator of cambium and xylem balance and acts redundantly with
WOX4 [
37]. The
WOX2 is known to be a regulator of early embryogenesis and a callus formation [
38]. The
CLE7 gene in
Arabidopsis also functions as a regulator of callus formation and regeneration [
39]. Since, according to our previous data, spontaneous tumours on radish taproots originate from the cambium and develop as undifferentiated callus-like structures [
11], these genes are perspective candidates for tumour regulators. However, the results on these were not very encouraging, as our data show that none of the corresponding SNVs were found in the sequences of these genes in other radish tumour lines studied.
The genes
RsWOX14,
RsWOX2 and
RsCLE7 are represented by a single copy in the radish genome, but homozygosity for loss-of-function mutations in them does not result in reduced viability of the radish line 19. According to available data, a single mutation in each of these genes in
Arabidopsis did not cause any serious developmental abnormalities in the mature plants [
37].
Analysis of the genomes of radish lines also allowed us to identify two
RsCLE genes,
RsCLEm1 and
RsCLEm2, which are likely to encode proteins with multiple CLE domains and a unique CLE domain composition (
Figure 7). There are no identified homologs of these genes in
Arabidopsis, but they are related to the
B. napus CLEm genes, which encode multidomain CLE proteins that function as light stimulators of shoot apical meristem activity [
15]. The
RsCLEms contain eight tandem CLE domain sequences and are close to
BnCLEm3, whose product contains five nearly identical tandem CLE domains [
15].
Thus, in addition to identifying SNVs probably associated with tumours, the sequencing of the radish inbred lines allowed the identification of novel CLE family genes.
Author Contributions
Conceptualization, I.D. and L.L.; methodology, X.K., A.A, E.G., L.D., M.G., V.T., N.G.; software, X.K., A.A., L.D.; validation, X.K., A.A., L.D; formal analysis, X.K.; investigation, X.K.; resources, X.K.; data curation, X.K.; writing—original draft preparation, X.K.; writing—review and editing, I. D. and V.T.; visualization, K.K. and M.G.; supervision, I.D.; project administration, L.L.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.