1. Introduction
There are about 52 cotton species with 9 different cytogenetic genomes. Of these, 8 are diploids and 1 is tetraploid [
1].
Gossypium hirsutum and
Gossypium barbadense have tetraploid genomes. They are the most widely cultivated species in the world [
2]. The cotton plant is used for both its fibre and its oil [
3]. The cotton plant is the source of 35% of the world's fibre requirements [
4]. Cottonseed oil can be used in the production of biodiesel [
5] and edible oils [
6]. For this reason, the genetic resources of such an important crop must be the subject of a comprehensive evaluation. For breeders, the role of genetic resources is very important [
7]. Although genetic resources are important in biological research and molecular biology studies, they are also very important in understanding genetic variation in breeding studies [
8]. For this reason, the collection, conservation and evaluation of genetic resources will greatly contribute to breeding studies for the development of new varieties [
3,
8,
9]. It is also difficult to study genetic variation using classical methods because of the continuous collection and multiplication of germplasm. The solution is to record genetic diversity using genetic markers, which are not affected by space and time and can measure allele frequencies directly [
10,
11,
12].
The traditional agarose gel electrophoresis method for visualising amplicons in the detection of genetic diversity using DNA markers is more widely used because it is more economical. In comparison to capillary gel electrophoresis, this method has been reported to be weaker in the identification of genetic diversity [
13]. The most important feature of the capillary gel electrophoresis method is that it reliably distinguishes between molecular markers down to 1 or 2 bp and quickly reveals the difference between amplicons [
14].
In this study, the genetic diversity of a total of 47 cultivars belonging to both G. hirsutum and G. barbadense species was investigated using 19 microsatellite markers (EST-SSR) and high-resolution capillary gel electrophoresis. The genetic diversity of the cotton plant, which has an important place in the world and in our country, was the main objective of this study.
2. Materials and Methods
Plant Material
A total of 47 different varieties from the Turkish gene pool were used in the present study (
Table 1). Two of these are genotypes of Texas Marker-1 (TM-1/
Gossypium hirsutum) and Pima 3-79 (
Gossypium barbadense species with double hoploid characteristics), which are genetic standards. Ten of the genotypes used belong to the species
G. barbadense and the other 37 genotypes belong to the species
G. hirsutum. Some of these genotypes have been developed in our country through breeding studies, and some have been brought into our country's genetic resources from other countries and are used in adaptation and breeding studies.
DNA was isolated using the method developed for cottonseed by Aydin et al. [
15]. To isolate the DNA, 5 seeds of each genotype were crushed under sterile conditions and the bulk method was used. The quality and quantity of the isolated genomic DNA was confirmed by NanoDrop spectrophotometer (Maestrogen, MN-013) and 1% agarose gel electrophoresis. Calculated and normalised amounts of gDNA were used for PCR studies.
A total of 19 SSR markers were used in PCR studies, including 12 EST-SSR primers developed by Wang et al. [
16] and 7 EST-SSR markers developed by Karaca and Ince [
3] (
Table 2). PCR amplifications were performed in a volume of 25 µL. 85 ng of gDNAs, 0.5 µM of each primer pair, 2.5 mM MgCI2, 0.28 mM dNTP, 1 unit of Taq DNA polymerase (Thermo, Cat:EP0402), 2.5 µL of 10X buffer were added. PCR protocol Touch-Down PCR method was used and the temperature was decreased at 0.5
oC at each cycle step for the first 10 cycles and continued for 30 cycles with the temperature at the 10th cycle. Pre-denaturation was performed at 94
oC for 3 min, 94
oC for 30 s, binding temperature at 60 and 66
oC (primer binding temperature,
Table 2) for 45 s, elongation temperature at 72
oC for 1 min and final elongation at 72
oC for 10 min [
17]. Amplifications were performed using a Thermo Fisher Scientific (Ref: A24812) thermal cycling device in PCR procedures.
The PCR amplicons were determined by capillary gel electrophoresis. With this method, PCR products were performed on the QIAxcel Advanced instrument (Qiagen, Cat.No./ID:90001941) using the QIAxcel DNA High Resolution Kit (Cat.No. 929002). Amplicons were analysed using QIAxcel ScreenGel version 1.6.
Different statistics were performed on the data obtained by analyzing the amplicons. Polymorphism information content (PIC) for each marker [
8], Jaccard's coefficient with the Multi-Variate Statistical Package (MVSP, version 3.13O, Kovach Computing Services, Pentraeth, UK) for PCoA analysis, Bayesian statistics with MrBayes software v3.2.1 x64 for phylogenetic dendogram [
18] and Structure software version 2.3.4 for population structure, and the results were extracted using the STRUCTURE HARVESTER online tool (
http://taylor0.biology.ucla.edu/structureHarvester/) to calculate the optimal K value [
19,
20]. PowerMarker v3.25 software was used for expected heterozygosity (H
e) and observed heterozygosity (H
o) analyses [
21].
3. Results
3.1. Polymorphism Analysis of EST-SSR Markers
In this study, 19 EST-SSR markers were used to screen 47 cotton genotypes. Accordingly, primers used were observed to generate between 2 and 42 amplicons (
Table 2). While the lowest number of alleles was observed in primer MK086, the highest number of alleles was observed in primer GD08-420. A total of 280 alleles were analyzed from 19 EST-SSR primers. The average number of amplicons per primer was calculated to be 14.7. In 47 genotypes, the primers used in markers MK086, MK126, MK129 had the lowest PIC values. PIC values were very high in the remaining 16 primers. The PIC values ranged from 0.268 (MK086) to 0.889 (GD02-301). While the average PIC value was 0.603, 11 EST-SSR markers (GA04-1418, GA07-410, GD01-295, GD02-301, GD03-2002, GD06-2808, GD08-420, GD09-1296, GD10-1664, MK105, MK173) were found to be above this average.
The expected average ratio of heterozygosity for all the primers was 0.620 and the observed average ratio of heterozygosity was 0.433. The lowest heterozygosity ratio was 0.293 for the MK086 marker and the highest was 0.898 for the GD02-301 marker. In addition, a total of 11 EST-SSR markers were above average. Similarly, marker GD02-301 had the highest Ho value of 0.883, whereas the lowest Ho value was 0.043 for marker MK126.
3.2. Clustering and PCoA Analysis
Corresponding to the amplicons generated with 19 EST-SSR markers, data from 47 cotton genotypes used in the study were analysed. In this regard, The phylogenetic tree was constructed using Bayesian statistics with the programme MrBayes. (
Figure 1). The tree presented by the Bayesian statistics has been coloured with the programme FigTree v1.4.4.
In the clustering analysis, the genotypes Pima3-79 (G. barbadense) and Askabat-71 (G. barbadense) differed from the other genotypes with a post-probability value of 100%.
The remaining 45 genotypes were divided into different groups. They were coloured green, red and purple. The green and purple coloured ones are the genotypes that belong to the G. hirsutum species and the red coloured ones are the genotypes that belong to the G. barbadense species. Of the green coloured genotypes, G. barbadense species and cultivars are known to be Bahar-14 and Askabat-91. On the other hand, although there were genotypes belonging to the red coloured G. barbadense species among the purple stained genotypes, their clustering was observed as a different group. Only genotype GB-58 was grouped in a cluster with BA320, Flas and PG-2018 with a post-probability value of 54%.
Principal Coordinates Analysis (PCoA) was performed on the EST-SSR data using Jaccard's similarity matrix to provide a different view of the genetic relationships between genotypes (
Figure 2).
The PCoA plot shows that the genotypes are distributed almost evenly. Only in the lower right corner of the Axis2 axis (the positive part of the X-axis and the negative part of the Y-axis) was the number of genotypes collected low, and more than half of the genotypes were cultivars belonging to the
G. barbadense species. Some genotypes grouped together in certain regions. Among these, Caroline Queen, Gloria, Carisma, Gaia and Sezener-76 genotypes were grouped in the upper left shelf of Axis2; Ligur, Gosyypol-free-86, Lockette, Diva, Veret and Acala Rpyale genotypes were closely grouped in the upper right shelf of Axis1 (
Figure 2).
3.3. Population Structure Analysis
A population analysis was carried out on 47 cotton genotypes with 280 alleles belonging to 19 markers obtained by means of EST-SSR markers. SUTRUCTURE v2.3.4 software was used to perform Bayesian clustering for population analysis. Using the STRUCTURE software, the analysis was carried out with 10000 burn-in periods and 100000 replicates, and the Delta K value was 3 (
Figure 3). This result showed that the 47 cotton genotypes used were divided into 3 groups. Genotypes with a membership coefficient of 0.8 and greater than 0.8 were considered pure [
21]. Of the 47 cotton genotypes used, only 30 were identified as pure in this case. According to the number in
Table 1; genotypes 1, 3, 5, 6, 7, 9, 10, 11, 13, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 29, 30, 31, 32, 33, 34, 36, 38, 40, and 41 were considered pure.
The groupings by blotting are also shown in
Figure 4. According to this figure, 18 genotypes were calculated in group A, 16 in group B and 13 in group C. The group with the lowest number of genotypes is group C and in this group there are 2 genotypes of
G. barbadense (Ashgabat-91, Bahar-14) in contrast to
G. hirsutum. In group B, only 1 genotype out of 16 belongs to
G. barbadense (Askabat-100). Group A is the most populated group and contains 18 genotypes, of which 7 belong to
G. barbadense.
Pima -79 and G.B-58 genotypes of G. barbadense species in group A were confused with group C, while genotypes Askabat-71, Giza-45, Giza-70 and Giza-75 were confused with group B. Bahar-82 (G. barbadense) genotype in group B showed confusion with group A. However, Askabat-100 (G. barbadense) genotype showed confusion with both groups A and C. Apart from this, Askabat-91 (G. barbadense) genotype in group C showed some confusion with group A.
4. Discussion
Using DNA sequences and regions of the genome with different characteristics, many marker techniques have been developed. The choice of marker techniques is in accordance with the purpose of the study [
22]. SSR markers are the preferred choice for studies such as genetic diversity and the identification of QTLs [
23,
24,
25]. These markers have been developed for many crops and cotton since the discovery of EST-SSRs by Cardle et al. [
3,
26,
27,
28]. Some of the reasons why EST-SSR markers are the most preferred are that they are reliable and reproducible, co-dominant, cheap and easy to use, and easily transferable between species [
29]. It has been extensively used for genetic diversity in cotton [
8,
30,
31], linkage studies [
32], determination of abiotic stress tolerance [
33], mapping [
34,
35].
In this study, a total of 19 EST-SSR markers were used, which were developed for cotton by Wang et al. [
16] and Karaca and Ince [
3]. As the markers used are located in the expressed regions of the genome, this gives us the information that these markers have a high population discriminatory power, depending on the population used. Karaca and Ince [
3] found that markers MK086, MK132, MK146 and MK173 were monomorphic in
G. hirsutum and
G. barbadense species by agarose gel electrophoresis (AGE). They reported that when converted to the CAPS marker (using the
Hinf I restriction enzyme), only the MK129 marker was polymorphic. However, with the method of capillary gel electrophoresis (CGE) that we have used, we have observed that these markers form direct polymorphic amplicons. PIC values ranged from 0.268 to 0.858 and the mean PIC value was calculated as 0.501. Since the CGE method has a higher resolution, it is more advantageous than the AGE method and its information content has been reported by various researchers [
13,
36,
37]. The EST-SSRs that were developed by Wang et al. [
16] and the EST-SSRs that were used in our study have, in general, very similar repeat sequences. However, the PIC values of the markers ranged from 0.437 to 0.889. The average value was 0.663. In addition, the markers that were developed were located on different chromosomes of the A and D genomes that were selected at random. The PIC values of the markers in the D genome were also found to be higher than in the A genome in this study. Molecular data reported that all allopolyploids in
Gossypium share a common ancestry, supporting the hypothesis that polyploid formation occurs only once [
38]. Furthermore Wendel [
39] reported that each allopolyploid genome contains the chloroplast of genome A, the old world cotton. The differences in genome D may therefore be responsible for the higher PIC values in genome D.
For the conservation and utilisation of genetic resources and for breeding studies, the determination of genetic diversity is very important [
8,
21]. Genetic variability, resulting from genetic relatedness and genetic diversity between plant groups, is critical to the success of plant breeding [
40,
41]. Better parental selection to produce different varieties can greatly improve breeding studies. The use of markers is important in genetic studies to identify heterotic groups, understand population structure, and distinguish between basic lineages [
10]. To reveal the diversity of genotypes in our unit, both cluster analysis and population structure were used.
The genetic relationship between
G. hirsutum and
G. barbadense was analysed by cluster analysis using Bayesian statistics. In the analysis, except for 2 genotypes (Bahar-14 and Askabat-91) belonging to
G. barbadense species, the remaining genotypes were grouped differently. Furthermore, Aydın [
15] reported that some seeds of Askabat-91 genotype had the same characteristics as
G. hirsutum. There is therefore a high probability that seeds of this genotype are contaminated. Except for Pima 3-79, the other 9 genotypes of
G. barbadense (Giza-45, Giza-70, Giza-75, GB-58 originating from Egypt; Bahar-14, Bahar-82, Askabat-71, Askabat-91 and Askabat-100 originating from Turkmenistan) are cultivated for their fiber (Aydın, 2018). Clustering analysis separated Pima 3-79 and Aksabat-71 from other genotypes with 100% power. It has been reported that the Pima 3-79 genotype is a doubled haploid and is considered to be the genetic standard [
3,
15]. Therefore, the fact that this genotype is in another cluster can be related to Bayesian statistics. Accordingly, genotype Askabat-71 is likely to contain high levels of genetic material belonging to the
G. barbadense species. The clusters were determined by colouring the phylogenetic tree. Here, genotypes coloured purple and green are genotypes belonging to
G. hirsutum species, while those coloured red are genotypes belonging to
G. barbadense species. Genotypes belonging to
G. hirsutum species formed two distinct clusters in the clustering analysis. While the purple cluster contained 18 genotypes, the green cluster separated them with 19 genotypes. The level of variation is also negatively affected by the fact that cotton's genetic base has a narrow genetic structure compared to other crops [
31,
42,
43]. The development of cotton varieties is generally based on the use of existing varieties. This leads to the protection of the narrow genetic structure. In terms of population structure, the 47 genotypes in use were divided into 3 main groups. Of these, 30 genotypes were accepted as pure and the remaining 17 genotypes were concluded to be mixed. When this analysis is examined, it can be interpreted that heterozygosity is low. The high number of pure individuals is an indication that the variation is low and new studies should be initiated to increase this variation. Although pure lines are used in crossbreeding studies, researchers conducting breeding programmes prefer to have a high level of variation in order to develop varieties with the desired characteristics.