1. Introduction
Capsicum annuum, commonly known as chillies, peppers or bell peppers, is a plant species belonging to the genus
Capsicum in the Solanaceae family, diploid and self-pollinating crop that is native to southern North America and northern South America [
1]. The chilli peppers fruits are not only used as spices and vegetables but also have medicinal purposes and are rich in vitamins A and C. Additionally, they are utilized as natural coloring agents, in cosmetics and as active ingredients in host defense repellents [
2]. Although
C. annuum is widely cultivated in Romania due to its economic importance, most of the cultivars are of foreign origin. There are few local Romanian varieties and most of them have limited yield potential [
3]. Therefore, preserving and improving local pepper varieties requires evaluation of their degree of variation based on genetic characteristics and monitoring of the desired characters with valuable genotypes.
Molecular markers play a crucial role in plant genetics, they are essential for estimating the variability of plant varieties and species, helping to detect genetic relationships within plant genera [
4,
5]. In the case of
C. annuum, inter simple sequence repeat (ISSR) markers have been utilized to assess genetic diversity and population structure [
6], to identify genetic homogeneity and to generate molecular profiles to study genetic variability [
7]. Several studies have used microsatellites (SSRs) to characterize and generate a molecular genetic map of SSR loci for
C. annuum [
8], to study genetic variability [
9,
10] or to design SSR primers that are transferable between
Capsicum species [
11]. Moreover, whole genome sequencing (WGS) data can provide extensive insights into the molecular mechanisms that link the association of genotypes with diseases and traits, identifying relevant variants and assessing their functionality [
12,
13,
14,
15].
The germplasm collection of Vegetable Research and Development Station (VRDS) Buzău, Romania, consists of 214 accessions of
Capsicum spp. grouped by degree of genetic stability, as follows: stable, advanced and segregating [
3]. Seven Romanian pepper (
Capsicum annuum L.) genotypes (‘Decebal’/gDEC, ‘Vladimir’/gVLA, ‘Galben Superior’/gGAL, ‘Splendens’/gSPL, ‘Cosmin’/gCOS, ‘Roial’/gROI and ‘Cantemir’/gCAN) with valuable traits patented by the VRDS and registered in the Official Catalogue of Cultivated Plant Varieties in Romania (ISTIS), were studied in the present experiment for evaluating their genetic diversity, and were characterized in deep at molecular level using WGS for variant/mutation detection, Sanger sequencing, and genotyping with microsatellites molecular markers [
16].
3. Discussion
In Romania, several types of pepper varieties (bell pepper, red fibster pepper, sweet long red pepper or hot pepper) are traditionally cultivated for fresh consume and for preserves [
19]. Romanian consumers prefer locally grown vegetables for their taste, shape, color, size and pepper is one of the main vegetable crops preferred by the local consumers, so the breeders are constantly developing new varieties [
20,
21]. Despite this, there is a lack of molecular data and conservation status of these pepper varieties. Accordingly, seven Romanian pepper (
C. annuum L.) varieties with valuable traits, homologated by VRDS, were studied in the present experiment for evaluating their genetic diversity (
Supplementary Figure 12) as well as in deep molecular characterization using WGS for variant/mutation detection. The pepper varieties used in the present study were registered in the ISTIS catalog between 1974 and 2019 [
16] and maintained in conservative selection at VRDS development station. The varieties were chosen not only for their superior organoleptic properties, good yield and storage potential, but also for their resistance to biotic and abiotic factors [
22]. Studies regarding fruit quality [
23], seed germination [
24], fruit storage [
25], impact of environmental conditions over the crop growing [
26,
27], the influence of organic fertilizers [
28,
29], were performed on several Romanian varieties of pepper including gGAL, gCOS and gSPL. Several authors reported morphological and physiological studies [
30], seedling growth [
31], but no molecular studies have been published yet using WGS analysis in order to characterize VRDS pepper germplasm.
The dendrograms that shows the hierarchical trees generated by software (BIO-R) exposed that all varieties were clearly separated into two clusters. The ISSR profile revealed two clusters where gDEC and gVLA were grouped in one cluster based on their similarity and the second cluster with gROI and gSPL the first group and gGAL, gCOS and gCAN the second group, might suggests certain relationships and possible gene exchange among these varieties (
Figure 2). SSR profile revealed two clusters with gVLA and gDEC and gGAL very similar and another cluster with gROI, gSPL, gCAN and gCOS (
Figure 1). Multidimensional scaling analysis (MDS) in 2D represent the distances among the objects in a visual way were calculated with BIO-R software. MDS 2D variations for ISSR suggested a structure with related genotypes group as: gGAL, gCAN; gROI, gCOS, gSPL and unrelated genotypes: gDEC and gVLA. MDS 2D variations for SSR suggested a structure corresponding to combination as: gDEC,gGAL; gROI,gCAN; gSPL and gCOS at higher distance and unrelated genotypes gVLA (
Figure 11) .
Genomic alignment of ISSR-PCR 1000 bp cloned fragment against the pepper reference genome produced significant alignments with LTR retrotransposons. LTRs are mobile genetic elements characterized by their long terminal repeats essential for transposable elements integrations and some of the most abundant components found within eukaryotic genomes. [
32]. Genome–wide analysis of ISSR-PCR 1000 bp LTR retrotransposon showed 12 K distance from LOC 107854643 serine/threonine-protein kinase
ATG1t gene and 13 K distance from LOC 124889513
auxin-responsive protein SAUR68-like gene, both on
Capsicum annuum chromosome 12 UCD 10Xv1.1.
Multiple alignments were conducted in the SOL Genomics Database with alignment analyzer tool to identify potentially active LTRs within Solanaceae family. BLAST search against Solanaceae popular datasets revealed 99% identity of 1000 bp complete query length with
C. annuum Dempsey V1.0,
C. annuum Maor V1.0,
C. annuum UCD10X,
C. annuum Zunla genomes and 94% identity with
C. chinense genome scaffolds (release 0.5). In tomato reference genome (SL4.0) and tomato wild species as:
S. pimpinellifolium LA1670 genome and vs
S. pennellii BAC Ends only a short fragment of 100 bp had 86% identity with LTR between 600-700 bp query length. Against Potato genome and Potato Bac sequences, the same 100 bp short fragment against our clone, had 88% identity and an extra fragment of 200 bp with 79% identity was located at 5’end of the 1000 bp cloned LTR fragment. In Eggplant V 4.1,
Nicotiana benthamiana V 2.6.1. and
Petunia axillaris V 1.6.2. genomes only a 200 bp fragment located at 5’end had 80% identity. On the basis of sequence alignments, it seems that among Solanaceous species, transposable elements can move through genomes and may have experienced distinct degeneration events along with genome evolutionary history [
33,
34].
The pepper genome, as a result of large genome size, may be the best model for the analysis of genome expansion through evolution of constitutive heterochromatic regions [
35] and particularly LTR retrotransposons, are major factors that constitute the heterochromatin sequences [
36,
37,
38].
Database study revealed that the selected clone corresponding to 220 bp SSR-PCR marker is an expressed sequence tag (EST)-SSR molecular marker, identified from the transcribed region of the pathogenesis-related protein 10 (
PR-10) gene. This EST-SSR molecular marker was present in all analyzed genotypes indicating conservation of wide range PR-10 up-regulated protein, expressed during hypersensitive response upon infection by pathogens. PR-10 are multifunctional proteins, present throughout various plant tissues, playing significant role in growth, development and stress response with crucial role in plant defense against pathogens [
39,
40,
41,
42].
WGS analysis revealed that SNPs located upstream away from transcription start site of the gene and SNPs mutations in exonic region (CDS) were the most abundant on gSPL genotype. Moreover, the genotype gSPL presented the highest number of Stop gain (introduction of stop codon at the variant site) and Stop loss (removal of stop codon at the variant site) exonic SNPs mutations. The genotype sSPL (Splendens,
Capsicum annuum L.
spp. grossum (L.)
Filov. cousin. tetragonum Miller), red fibster pepper, is a variety registered in the ISTIS catalog in 2008. The lowest number of total SNPs was observed in gVLA genotype (Vladimir,
Capsicum annuum L.
spp. annuum convar. Microcarpum Filov.), a red hot pepper variety registered in the ISTIS catalog in 2015 (VRDS Buzău) [
16].
SNPs located within the intergenic region, transitions and transversions showed the highest value on gCOS genotype, as well as the total number of SNPs. gCOS (Cosmin,
Capsicum annuum L.
spp. longum (DC.) Terpó) is a sweet long red pepper variety registered in the ISTIS catalog in 1984 (VRDS Buzău). SNPs are associated with variations in phenotype and resistance to disease as they may alter protein structure and function, enhance the binding affinity of transcription factors, modify alternative splicing and regulate non-coding RNA [
43,
44,
45]. Additionally, SNPs have been used to identify QTLs underlying various traits in plants [
46,
47].
The highest density of Insertions, Deletions and Intergenic mutations located within the > 2 kb intergenic region was observed in genotype gCOS and the lowest number to gVLA genotype. InDels can affect the synthesis of proteins and functional RNA molecules. InDel mutations have been a valuable complement to SNPs and simple sequence repeats (SSRs) [
48]. InDel variations can be formed by unequal crossover, transposable elements and sequence replication in regions of repetitive DNA [
46,
49,
50].
Structural variants (SVs) are genomic variation with mutations of relatively larger size (>50 bp) that can have significant effects on gene expression and phenotype [
51]. Genotype gCOS showed the highest number of SVs located within the > 2 kb intergenic region, deletions, inversions, intra-chromosomal translocations and inter-chromosomal translocations. Hence the total number of SVs was assigned to gCOS genotype and the lowest one to gVLA genotype.
Copy-number variation (CNV) is a type of structural variation that contributes to phenotypic variance including duplications and deletions. CNVs, including duplications and deletions, can influence gene expression by disrupting gene coding sequences, perturbing long-range gene regulation or altering gene dosage [
52]. CNVs with increased copy number (duplications) and CNVs located in exonic region was observed in gVLA genotype. The gDEC genotype showed the highest number of Upstream/ Downstream CNVs located within the < 2 kb intergenic region and also the highest number of CNVs with decreased copy number (deletions). CNVs can influence gene expression by disrupting gene coding sequences and perturbing long-range gene regulation or altering gene dosage [
52]. For proper visualization of the structural variations on the whole-genome, sequencing data of gCOS genotype is presented according to mutation types with Circos plots. Circos uses a circular ideogram to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements [
53].
The outer ring represents the chromosomes and inside the chromosomes ring are drawn the density of SNP/InDel type distribution as well as for SV/CNV type, the location and size (Novogene Co., Ltd., Cambridge, UK.). Whole genome variations distribution shown aligned regions between chromosomes connected with ribbons to illustrate relationships between genomic positions. gCOS displayed the highest number of SVs located within the > 2 kb intergenic region, deletions, inversions, intra-chromosomal translocations and inter-chromosomal translocations. Additionally, the width of the ribbon corresponds to the alignment length at specific locations. For gCOS, the 90-200 M region on chromosome 4 (NC_029980.1), shows large deletions and inversions as well as translocations that involve chromosomes 5 (NC_029981.1), 6 (NC_029982.1) and 7 (NC_029983.1). (
Figure 12). Also, 0-50 M region on chromosome 8 (NC_029984.1), shows large deletions and inversions that involve chromosomes 10 (NC_029986.1). A strong relation between genomic positions of 150-200M from chromosome 10 (NC_029986.1) is illustrated with 0-50 M from chromosome 12 (NC_029988.1) (
Figure 12). A visual representation of WGS data with Circos plots, SNPs and InDels density per chromosomes for all studied genotypes are presented in
Supplementary Figures 13.
Figure 1.
Dendrogram with agglomerative coefficients and diversity analysis for SSR markers.
Figure 1.
Dendrogram with agglomerative coefficients and diversity analysis for SSR markers.
Figure 2.
Dendrogram with agglomerative coefficients and diversity analysis for ISSR markers.
Figure 2.
Dendrogram with agglomerative coefficients and diversity analysis for ISSR markers.
Figure 3.
Frequency and type of SNPs mutations for each genotype: the most common SNPs mutation type distribution was C:G>T:A and T:A>C:G. The pie chart shows the number of SNPs in different regions of the genome for genotype gSPL; in exonic region, gSPL genotype presented the highest number of synonymous and non-synonymous SNP mutations.
Figure 3.
Frequency and type of SNPs mutations for each genotype: the most common SNPs mutation type distribution was C:G>T:A and T:A>C:G. The pie chart shows the number of SNPs in different regions of the genome for genotype gSPL; in exonic region, gSPL genotype presented the highest number of synonymous and non-synonymous SNP mutations.
Figure 4.
SNPs density per chromosomes for genotype gSPL.
Figure 4.
SNPs density per chromosomes for genotype gSPL.
Figure 5.
The length distribution of InDels for all genotypes within the coding sequence.
Figure 5.
The length distribution of InDels for all genotypes within the coding sequence.
Figure 6.
InDel density per chromosomes for genotype gCOS.
Figure 6.
InDel density per chromosomes for genotype gCOS.
Figure 7.
The number of SVs in different regions of the genome for all genotypes. gGAL and gCOS presented the highest number the Insertions (INS). The lowest number of INS was observed in gVLA and gROI genotypes. The details of SV detection statistics are as follows: CTX (Inter-chromosomal translocations); ITX (Intra-chromosomal translocations); INS (Insersion); DEL (Deletion); INV (Inversion); Splicing; Intergenic; Upstream/Downstream; Intronic; Downstream; Exonic; Upstream.
Figure 7.
The number of SVs in different regions of the genome for all genotypes. gGAL and gCOS presented the highest number the Insertions (INS). The lowest number of INS was observed in gVLA and gROI genotypes. The details of SV detection statistics are as follows: CTX (Inter-chromosomal translocations); ITX (Intra-chromosomal translocations); INS (Insersion); DEL (Deletion); INV (Inversion); Splicing; Intergenic; Upstream/Downstream; Intronic; Downstream; Exonic; Upstream.
Figure 8.
Variation type statistics distribution of CNVs in the genome.
Figure 8.
Variation type statistics distribution of CNVs in the genome.
Figure 9.
Multiple genomic alignment between the C. annuum reference genome Pepper Zunla 1 Ref_v1.0 unplaced genomic scaffold, the ISSR-PCR 1000 bp cloned fragment (LTR) UCD10Xv1.1 whole genome shotgun sequence ID: NC_061122.1 and all BAM files of Capsicum annuum local genotypes sequences from chromosome 12. On the right side the BLAST revealed SNPs mutations on cloned fragment for gSPL, gCAN and gROI genotypes on base position 41.494 and SNPs mutation on base position 41.809 for all seven genotypes. gDEC exhibits significant mutations on cloned fragment. On the right side is a close-up view of SNPs at specific positions.
Figure 9.
Multiple genomic alignment between the C. annuum reference genome Pepper Zunla 1 Ref_v1.0 unplaced genomic scaffold, the ISSR-PCR 1000 bp cloned fragment (LTR) UCD10Xv1.1 whole genome shotgun sequence ID: NC_061122.1 and all BAM files of Capsicum annuum local genotypes sequences from chromosome 12. On the right side the BLAST revealed SNPs mutations on cloned fragment for gSPL, gCAN and gROI genotypes on base position 41.494 and SNPs mutation on base position 41.809 for all seven genotypes. gDEC exhibits significant mutations on cloned fragment. On the right side is a close-up view of SNPs at specific positions.
Figure 10.
Multiple genomic alignment of C. annuum reference genome, SSR-PCR 220 bp cloned fragment (PR-10 protein) and chromosome 3 sequences for all seven genotypes revealed ts SNP mutations only on gCOS genotype. On the left side is presented graphical sequence view with a point mutation on base position 254.256.561 and another SNP on base position 254.256.599; on the right side, three nucleotide insertions as CTT type on 254,256,423 position for gCAN genotype and one nucleotide insertion as T type on 254,256,512 position for gCOS genotype.
Figure 10.
Multiple genomic alignment of C. annuum reference genome, SSR-PCR 220 bp cloned fragment (PR-10 protein) and chromosome 3 sequences for all seven genotypes revealed ts SNP mutations only on gCOS genotype. On the left side is presented graphical sequence view with a point mutation on base position 254.256.561 and another SNP on base position 254.256.599; on the right side, three nucleotide insertions as CTT type on 254,256,423 position for gCAN genotype and one nucleotide insertion as T type on 254,256,512 position for gCOS genotype.
Figure 11.
Multidimensional scaling analysis (MDS) for ISSR analysis based on Nei’s distance and SSR analysis based on modified Roger’s distance. CP1 and CP2 are the first and second principal coordinate matrices, respectively combination to related genotypes group: for ISSR analysis-gGAL, gCAN/ gROI, gCOS, gSPL and unrelated genotypes gDEC, gVLA are shown; for SSR analysis-gDEC,gGAL/ gROI,gCAN/ gSPL,gCOS and unrelated genotypes gVLA.
Figure 11.
Multidimensional scaling analysis (MDS) for ISSR analysis based on Nei’s distance and SSR analysis based on modified Roger’s distance. CP1 and CP2 are the first and second principal coordinate matrices, respectively combination to related genotypes group: for ISSR analysis-gGAL, gCAN/ gROI, gCOS, gSPL and unrelated genotypes gDEC, gVLA are shown; for SSR analysis-gDEC,gGAL/ gROI,gCAN/ gSPL,gCOS and unrelated genotypes gVLA.
Figure 12.
Visualization of the structural variations on the whole genome for gCOS genotype according with Circos plot analysis. The 90-200 Mb region on chromosome 4 (NC_029980.1), shows large deletions and inversions as well as translocations that involve chromosomes 5 (NC_029981.1), 6 (NC_029982.1) and 7 (NC_029983.1).
Figure 12.
Visualization of the structural variations on the whole genome for gCOS genotype according with Circos plot analysis. The 90-200 Mb region on chromosome 4 (NC_029980.1), shows large deletions and inversions as well as translocations that involve chromosomes 5 (NC_029981.1), 6 (NC_029982.1) and 7 (NC_029983.1).
Table 1.
Characteristics of SSR primers evaluated in Capsicum spp.
Table 1.
Characteristics of SSR primers evaluated in Capsicum spp.
ID |
Locus |
Primer (forward/ reverse) |
Size (bp) |
Tm (ºC) |
Total alleles |
PIC value |
SSRP3P4 |
AF244121 |
5'TACCTCCTCGCCAATCCTTCTG 3'/ 5'TTGAAAGTTCTTTCCATGACAACC 3' |
200-400 bp |
45 |
3 |
0.63 |
SSRP5P6 |
HpmS 1-148 |
5'GGCGGAGAAGAACTAGACGATTAGC3'/ 5'TCACCCAATCCACATAGACG 3' |
150-250 bp |
45 |
4 |
0.72 |
SSRP9P10 |
HpmS 1_1 |
5'TCAACCCAATATTAAGGTCACTTCC3'/ 5'CCAGGCGGGGATTGTAGATG3' |
260 pb |
49 |
NA |
NA |
SSRP11P12 |
HpmS 1_274 |
5'TCCCAGACCCCTCGTGATAG3'/ 5'TCCTGCTCCTTCCACAACTG 3' |
190-530 bp |
47 |
4 |
0.71 |
SSRP19P20 |
HpmS 1_172 |
5'GGGTTTGCATGATCTAAGCATTTT3'/ 5'CGCTGGAATGCATTGTCAAAGA3' |
230-420 bp |
48 |
3 |
0.66 |
Table 2.
Characteristics of ISSR primers evaluated in C. annuum varieties.
Table 2.
Characteristics of ISSR primers evaluated in C. annuum varieties.
ID |
Primer |
Tm (ºC) |
Total bands (TB) |
Range of the amplification product (bp) |
PIC value |
P21 |
5’ACGACAGACAGACAGACA3’ |
51 |
38 |
850-4000 bp |
0.08 |
P22 |
5’ACACACACACACACACCTG3’ |
50 |
28 |
500-2800 bp |
NA |
P23 |
5'GCAGACAGACAGACAGACGC3' |
50 |
68 |
500-4000 bp |
0.28 |
P24 |
5'GAGAGAGAGAGAGAGACTC 3' |
50 |
56 |
800-3800 bp |
0.23 |
P25 |
5'GAGAGAGAGAGAGAGACTC3' |
50 |
80 |
550-3100 bp |
0.29 |
P26 |
5'CACACACACACACACAAGT 3' |
51 |
27 |
1000-2500 bp |
0.26 |
P27 |
5'GACAGACAGACAGACAGT3' |
51 |
70 |
380-4000 bp |
0.20 |
P28 |
5'TCCTCCTCCTCCTCCAGCT3' |
50 |
34 |
350-2700 bp |
0.29 |
Table 3.
Statistics of SNPs detection and annotation based on WGS for all studied genotypes (Decebal/gDEC; Vladimir/gVLA; Galben Superior/gGAL; Splendens/gSPL; Cosmin/gCOS; Roial/gROI and Cantemir/gCAN). The details for SNP detection and annotation statistics are as follows: Upstream (SNPs located within 1 kb upstream); Exonic: SNPs located in exonic region (Stop gain, Stop loss, Synonymous, Non-Synonymous); Intronic; Splicing; Downstream; Intergenic; Transitions (ts); Transversions (tv); Heterozygous rate (Het. rate).
Table 3.
Statistics of SNPs detection and annotation based on WGS for all studied genotypes (Decebal/gDEC; Vladimir/gVLA; Galben Superior/gGAL; Splendens/gSPL; Cosmin/gCOS; Roial/gROI and Cantemir/gCAN). The details for SNP detection and annotation statistics are as follows: Upstream (SNPs located within 1 kb upstream); Exonic: SNPs located in exonic region (Stop gain, Stop loss, Synonymous, Non-Synonymous); Intronic; Splicing; Downstream; Intergenic; Transitions (ts); Transversions (tv); Heterozygous rate (Het. rate).
Genotype |
gDEC |
gVLA |
gGAL |
gSPL |
gCOS |
gROI |
gCAN |
Upstream |
98681 |
93968 |
97182 |
104198 |
100293 |
95159 |
98084 |
Exonic Stop gain
|
645 |
644 |
635 |
720 |
664 |
657 |
634 |
Exonic Stop loss
|
164 |
162 |
166 |
182 |
175 |
163 |
160 |
Exonic Synonymous
|
17898 |
17757 |
17668 |
19851 |
18012 |
17321 |
18229 |
Exonic Non-synonymous
|
29071 |
28736 |
29115 |
32146 |
29349 |
28540 |
29504 |
Intronic |
255395 |
240657 |
249296 |
280766 |
262290 |
246344 |
260011 |
Splicing |
307 |
286 |
308 |
353 |
303 |
295 |
319 |
Downstream |
80621 |
77439 |
79503 |
86690 |
82171 |
77640 |
80567 |
Upstream/ Downstream
|
5010 |
4943 |
5126 |
5596 |
5365 |
4728 |
5058 |
Intergenic |
6899284 |
6009182 |
6706315 |
7209789 |
7383834 |
6528591 |
6955410 |
Others |
229351 |
213191 |
222529 |
244099 |
236949 |
214605 |
233623 |
ts |
5051787 |
4442276 |
4923303 |
5313137 |
5408262 |
4796597 |
5099410 |
tv |
2565834 |
2245868 |
2485560 |
2672644 |
2712300 |
2418530 |
2583426 |
ts/tv |
1.969 |
1.978 |
1.981 |
1.988 |
1.994 |
1.983 |
1.974 |
Het. rate |
0.162 |
0.636 |
0.126 |
0.764 |
0.13 |
0.116 |
0.121 |
Total |
7617621 |
6688144 |
7408863 |
7985781 |
8120562 |
7215127 |
7682836 |