1. Introduction
Tomatoes (
Solanum lycopersicum), as a globally valued horticultural crop, owe their economic impact, nutritional benefits, and culinary versatility to the genetic diversity within the species [
1]. Enhancing the genetic makeup of tomatoes has long been a priority for plant breeders, with a focus on improving traits such as yield, disease resistance, fruit quality, and environmental adaptability [
2,
3]. The genetic diversity within the Solanaceae family, which includes tomatoes, potatoes, and peppers, has been instrumental in breeding programs; nearly all disease resistances in contemporary tomato varieties originate from their wild relatives [
4].
The domestication and evolution of tomatoes have been profoundly shaped by the partitioning of genetic variation within and among populations, a process illuminated through population structure analysis. This genetic structure, influenced by historical processes like migration, natural selection, and genetic drift, has led to the formation of genetically distinct groups within the tomato species' range [
5]. Each tomato population bears unique genetic signatures shaped by environmental and historical factors [
6] making the identification of population structure crucial for understanding the extent of genetic differentiation and the factors contributing to it [
7]. Originating in the South American Andes with domestication in Mexico [
8], tomatoes spread globally, first to Europe in the 1500s and subsequently worldwide [
9], leading to significant genetic changes. The domestication process selected traits favorable to human cultivation, such as larger fruit size, reduced seed dispersal, and improved taste [
10] but also resulted in a genetic bottleneck, leaving cultivated varieties with less genetic diversity compared to their wild relatives [
11]. Tomatoes were introduced to the United States in the early 18th century, initially as ornamental plants due to their exotic appearance [
8]. Over time, they became an important food crop, with cultivation spreading across different regions of the country [
9]. The diverse climates across the United States—from the hot, humid Southeast to the cooler, arid West—necessitated the selection of varieties adapted to local conditions, which in turn contributed to the development of a broad range of tomato cultivars with varying traits [
10]. The genetic diversity of U.S. tomato germplasm is influenced by the genetic bottleneck that occurred during the early phases of tomato domestication and introduction. The tomatoes brought to the U.S. from Europe were already a subset of the original genetic diversity found in their native range in South America [
4]. However, American farmers and breeders expanded this diversity through the introduction of new varieties and the selection of plants with desirable traits such as disease resistance, yield, and fruit quality [
12].
The U.S. has played a pivotal role in the development of modern tomato varieties, particularly through its extensive breeding programs. These programs have focused on improving traits like disease resistance, fruit quality, shelf life, and adaptability to different growing conditions [
11]. Breeding efforts often involved the incorporation of genetic material from wild relatives of tomatoes, particularly for enhancing disease resistance. For instance, the introduction of resistance genes from wild species such as
Solanum pimpinellifolium and
Solanum habrochaites has been a significant focus [
4]. One of the most notable contributions to tomato genetic diversity from the U.S. is the development of hybrid varieties. Hybrid tomatoes, first introduced in the mid-20th century, offered improved uniformity, disease resistance, and higher yields [
13]. The creation of hybrids required the development of inbred lines with specific genetic traits, which further expanded the genetic base of U.S. tomato germplasm [
14].
Advancements in molecular biology techniques, such as single nucleotide polymorphisms (SNPs) and genotyping by sequencing (GBS), have significantly enhanced our understanding of tomato genetic diversity across different regions. In areas where tomatoes were first domesticated, such as Central and South America, higher genetic diversity is maintained [
14], while regions like Europe and North America, where tomatoes were introduced later, often exhibit lower genetic diversity due to the founder effect and subsequent selection pressures [
13]. Despite this, secondary centers of diversity, particularly in Italy and Spain, have emerged, showcasing distinctive morphological variations in their tomato populations [
12]. These molecular tools have been invaluable in providing detailed information about genetic relationships, population dynamics, and complex genetic architectures. They have also facilitated the detection of human-mediated gene flow, historical migrations, and hybridization events, enabling researchers to study demographic processes, hybridization dynamics, and the introgression of adaptive genes across genetic boundaries [
15]. Modern breeding techniques, including marker-assisted and genomic selection, allow breeders to leverage genetic markers linked to specific attributes, accelerating the development of cultivars with better quality, yield, and disease resistance [
10,
16]. Modern breeding techniques, including marker-assisted and genomic selection, allow breeders to leverage genetic markers linked to specific attributes, accelerating the development of cultivars with better quality, yield, and disease resistance [
6,
17].
Wild relatives of tomatoes have been vital in enhancing the genetic diversity of U.S. tomato germplasm. These wild species offer traits like pest and disease resistance, tolerance to abiotic stresses, and unique fruit qualities that have diminished in cultivated varieties [
4]. U.S. breeders have incorporated these genes through introgression breeding, significantly broadening the genetic base [
18]. Additionally, the U.S. hosts major germplasm collections, such as the National Plant Germplasm System (NPGS), which preserves a wide range of tomato genetic resources [
14]. The present study emphasizes the importance of studying tomato germplasm primarily from the U.S.-based USDA-GRIN collection, as this represents a significant portion of the global diversity in cultivated tomatoes [
13]. Understanding the genetic diversity and population structure within these accessions is crucial for both conservation and crop improvement efforts [
19]. By applying GBS to derive SNPs, the study explores the genetic relationships, population dynamics, and genetic structure of these tomato populations [
20]. Given the U.S.'s pivotal role in tomato breeding and the development of modern cultivars, analyzing this germplasm provides critical insights that can enhance breeding strategies and ensure the conservation of essential genetic resources for future agricultural challenges [
11].
3. Results
3.1. Single Nucleotide Polymorphism Diversity
In total, 10,724 polymorphic SNPs with less than 50% missing data were extracted from the 276 accessions. Subsequent filtering excluded alleles with frequency less than 1.5%, loci with more than 15% missing data, and loci with more than 35% heterozygosity. After this filtering, the final set comprised 5,162 SNPs of six distinct types: [AG] with 1,583 SNPs (30.67%), [CT] with 1,554 SNPs (30.1%), [GT] with 507 SNPs (9.82%), [AT] with 627 SNPs (12.12%), [AC] with 532 SNPs (10.31%), and [CG] with 359 SNPs (6.98%). Among the entire set of 5,162 loci, the major allele frequency ranged 0.45–0.98 (mean 0.91), gene diversity 0.002–0.59 (mean 0.12), heterozygosity 0.00–0.27 (mean 0.08), and PIC 0.001–0.49 (mean 0.09). These findings indicate that gene flow and genetic diversity are represented within the 276 tomato accessions.
3.2. Population Structure
Evaluation of the 276 tomato accessions/cultivars with STRUCTURE 2.3.4 [
25] revealed delta K to peak at K = 3, indicative of three main populations (clusters, Q1, Q2, and Q3) (
Figure 1A,B). When applying a likelihood threshold of 0.55, 266 accessions (96.4%) were assigned to one of the three populations: 27 (10.2%) to Q1, 201 (75.6%) to Q2, and 38 (14.2%) to Q3 (
Figure 1B,
Supplementary Table S1). The other ten (3.6%) were categorized as admixtures (
S1 Table). Neighbor-joining cluster analysis likewise yielded three groups (
Figure 1C), and PCA (
Figure 1D) was also consistent with three tomato populations well-differentiated on genetic features plus a scattering of admixtures.
3.3. The Accessions from Different Geographic Origins
Initially, the 276 tomato accessions were categorized into six groups based on geographic origin: North America excluding the USA, USA, Central and South America, Europe, Asia, and Australia. "Central and South America" was designated as a distinct category due to being recognized as the likely origin of tomato domestication [
14,
32,
33] . Meanwhile, cultivars from the United States were separated out in acknowledgement of their advanced development for cultivation (
Table 1,
Supplementary Table S2).
Table 1 summarizes tomato genetic parameters according to region of origin.
Notably, the USA and Central and South America were the most represented with a collective 218 accessions (79%), while North America excluding the USA, Europe, Asia, and Australia together comprised only 58 accessions (21%). Thus, the majority of "North America" accessions originated in the USA. Gene diversity varied among regions, ranging from 0.004 in Australia to 0.28 in Central and South America. Heterozygosity was limited outside Central and South America, at 0.01 in Australia and 0.04 elsewhere, suggesting that most alleles in these regions are fixed. Central and South America accessions displayed higher heterozygosity (0.1 or 10%). The PIC ranged from 0.003 in Australia to 0.24 in Central and South America, mirroring gene diversity and supporting that Australian accessions have the least variation, while those from Central and South America harbor the most variation. Construction of a neighbor-joining phylogenetic tree based on the CS Chord distance revealed three clusters: Cluster 1 comprising Asia, North America excluding the USA, and Central and South America; Cluster 2 comprising only the USA; and Cluster 3 consisting of Europe and Australia (
Figure 2). Tomatoes within a phylogenetic cluster shared more genetic background.
Table 2 summarizes the distribution of these distinct populations among the six regions.
All three of the structure populations were represented in accessions from the USA, Asia, and Central and South America; meanwhile, only two (Q2 and Q3) were identified in North America excluding the USA, and only one (Q2) in Europe and Australia. This reflects the influence of geographical factors on tomato genetic diversity and population structure.
3.4. Phylogeny of the Accessions Across Diverse Countries
Of the 33 countries from which accessions were obtained, 11 were represented by four or more accessions, comprising 236 accessions (
Supplementary Table S1). The genetic diversity of this subset was further examined according to country of origin.
Figure 3 shows the phylogenetic tree obtained when grouping accessions by country.
The countries were clearly divided into two groups: Cluster 1 comprising two countries from Central & South America (Peru and Ecuador), two from North America excluding the USA (Canada and Mexico), one from Europe (France), and the USA; and Cluster 2 comprising two countries from Central and South America (Chile and Brazil), two from Asia (China and Russia), and one from Europe (Italy). Overall, these two clusters are comparable to the region-based grouping, except in the placement of the USA, France, and Italy.
Author Contributions
Conceptualization, A.S. and H.X.; methodology, A.S. and H.X.; software, A.S. and H.X.; validation, A.S., H.X., and H.A.; formal analysis, A.S. and H.X.; investigation, I.A., A.S., Q.L., KC., Y.Q., and R.D; resources K.-S.L.; data curation, A.S., H.X., I.A. and H.A.; writing—original draft preparation, I.A., H.X. and H.A.; writing—KC, Q.L., Y.Q., R.D. and A.S.; visualization, A.S. and H.X.; supervision, A.S. and H.X.; project administration, A.S. and K.-S.L.; funding acquisition, A.S. All authors have read and agreed to the published version of the manuscript