1. Introduction
Soybean (
Glycine max(L.) Merr.) is widely grown oil crop in the world. It is also an important source of edible oil and high-quality vegetable protein [
1]. According to the different harvesting time and use of soybeans, it can be divided into two crops: food and vegetables. Vegetable soybeans are harvested at the R6 stage, when the pods are green and the seeds are full[
2]. Research has reported that the pods size, including large pods and grains, is an important visual quality unique to vegetable soybeans[
3,
4]. Therefore, the size of vegetable soybean pods has been considered to be one of the most important traits in accelerating the breeding process of vegetable soybean. In addition, the yield of vegetable soybean was also affected by plant height, pod number, pod width, pod thickness, pod length and fresh pod weight per plant. With the development of economy and the growth of world population, the demand for vegetable soybean has increased dramatically [
7], but there are fewer reports on the vegetable soybean yield-related trait at present. Therefore, it is necessary to analysis the genetic basis of soybean yield in R6 (maturity stage of vegetable soybean market) and improve the vegetable soybean yield.
Understanding the genetic variations related to yield traits was a necessary basis for breeding[
8]. Planting structure always been a key factor affecting the planting density and seed yield of soybeans in the field. The ideal soybean plant structure not only optimizes the structure of canopy, prevents lodging, improves photosynthetic efficiency, also achieves the higher yield [
7,
8]. As a kind of special soybean, the yield trait of vegetable soybean was accordance as cultivated soybean, including the soybean yield per plant (YP), pod number (PN), pod length (PL), pod width (PW), pod thick (PT), plant height (PH) and branch number (BN). Therefore, dissecting the important SNP loci associated for yield traits has become an essential topic in the vegetable soybean breeding[
9,
10,
11,
12].
Genome-wide association study (GWAS) is highly effective method to dissect the natural variation of quantitative traits based on linkage disequilibrium, which can provide a theoretical basis for analyzing the genetic structure and molecular improvement of soybean complex traits[
13,
14,
15]. For instance, there are 125 candidate selection regions and five potential candidate genes about agronomic traits were predicted using the GWAS [
16]. Zhang et al. identified 22 loci of minor effect and predicted 3 important candidate genes on chromosome 19 using genome-wide association studies [
17]. A total of 58 SNP
S significantly correlated with seed weight (SW), internode number (IN) and plant height (PH) were identified by GWAS, which 28 candidate genes about yield trait were predicted [
18]. In addition, 27 quantitative trait nucleotides (QTNs) associated with seed size correlations were identified using GWAS [
19,
20]. Although plant architecture and yield-traits about soybean have been verified, the mining of yield traits related genes in vegetable soybean (maturity stage of vegetable soybean market) was rarely reported.
Therefore, in this study, we collected and evaluated 188 diverse vegetable soybean genotypes for six yield-related traits, including number of pods, plant height,fresh pod weight, pod width, pod thick and pod length. Furthermore, GWAS was used to identify the genetic loci and candidate genes for yield traits, which provide a theoretical support for improving the yield of vegetable soybean.
3. Discussion
Vegetable soybeans are a very important type of legume vegetable, especially in Asian countries such as Japan and China. Due to its superior nutrition, appearance, and taste, the global demand for plant-based soybeans continues to grow. Since the 1990s, the demand for plant-based soybeans in the United States has been increasing, reaching 10000 tons in 2000 [
22,
23]. However, due to the lack of good varieties, the demand for vegetable soybeans cannot be met. China is the country with origin of soybeans and has the largest soybean genetic resources in the world. Based on the abundant soybean resources in China, the yield genetic structure of vegetable soybean was analyzed by GWAS to provide beneficial genes, functional markers and specific materials for molecular breeding.
At present, GWAS method has been used to analyze and calculate the association between genotype and phenotype variation, and dissecting the genetic basis of important traits [
24,
25,
26]. In the study conducted by Zeng et al[
27], phenotype variation of the association panel of grain yield in plant (GYP) and tassel branch number (TBN) was 42.37% and 49.79%; respectively; the CV of grain width (GW) and 100 kernel weight (HKW) were 11%, 19%. Meanwhile, some study shown that the phenotype variation of GYP is 40%; the CV of HKW is 17%; the CV of kernel number per row (KNR) was 28%, thus, significant phenotype variation in the population will be beneficial for analyzing the genetic structure of yield traits[
27]. Therefore, the plant structure, especially the number of nodes and branches, largely determines the pods number and yield of soybeans. Other plant structural, including the pods of number and the size of pods is another major component that affects the overall structure of the plant and grain yield. It is great significance to explore the yield related trait candidate genes of vegetable soybean for the evaluation of vegetable soybean yield and genetic breeding.
In this paper, 188 vegetable soybean accessions were divided into 9 categories by population structure analysis, which indicating that there was some variation within the population. In addition, these results were similar to phylogeny analyses and it could help prevent false positives from the result of GWAS[
28]. Besides, in the process of population analysis, the acceptable distance between candidate genes and markers were determined by LD, the variation of LD was varies with different populations[
29]. And the LD decay distance of 188 vegetable soybean varieties in this study was 150.00 kb (r
2= 0.375), within the previously reported range (90 kb to 574 kb)[
30,
31,
32]. Therefore, it is sufficient for a valid GWAS analysis.
In the previously reported, the QTL region related to plant height was located in Chr13:21,937,082-23,937,081 [
31,
32]. In this paper, a SNP (Chr13:22150035) identified from plant height trait, is also located in this interval, and
Glyma.13G109100 is located at 154.23kb away from chr13:22150035. Besides,
Glyma.13G109100 showed a higher expression level in long-stalked variety compared to short-stalked variety in our study (
Figure 5), indicates that it may deferentially regulate. This gene encodes MYB-related transcription factors, a very conserved transcription factor family in eukaryotes, which involved in many developmental processes, such as root hair development, pollen formation, seed germination, flower stem strength, yield, etc. It also plays a role in abiotic stresses such as drought, ultraviolet light, cold stress, high temperature stress and salt stress [
33,
34]. MYB-related transcription factors also could reduction the cell size regulate plant structure such as internode length, petiole length, leaf area, and plant height via brassin steroid (BR) pathway[
33]. The studies also shown that
OsMYB110 not only changes the plant height, but also endows rice yield by increasing the number of grains per panic and grain size[
35]. Therefore,
Glyma.13G109100 was predicted to be the candidate gene to regulate the vegetable soybean height.
In the recently report, there are 294 seed-weight related SNPs were identified on the soybean chromosomes (
http://www.soybase.org/) [
36]. Besides, some SNP
S have been identified that were correlated with yield traits, such as seed size, maturity, flowering time and plant height. For the seed weight, Chr09:18491673 overlaps with a known reported SNP locus that controls seed weight [
38]. Combine with annotation information analysis,
Glyma.09G102300 and Glyma.09G102200 were identified as a candidate gene for controlling seed yield in vegetable soybean. Among them,
Glyma.09G102300 mainly encodes F-box related protein and F-box proteins are a family of proteins that exist widely in eukaryotes and contain the F-box domain. It plays an important role in cell cycle regulation, transcription regulation, apoptosis, cell signal transduction, growth and development [
39,
40]. In addition, studies have shown that F-box proteins OsFBX206, OsFBK12 can regulate the synthesis of ethylene and boisterousness for improving the grain size and yield in rice[
41,
42,
43].Besides, we found that NADH dehydrogenase1 alpha sub-complex was predicted to be an interacting protein of the fresh pod weight candidate gene
Glyma.09G102300. The stability of NADH dehydrogenase1 has been demonstrated that play an important role to regulate seed germination, yield and growth retardation in Arabidopsis and maize [
44,
45].In addition, As a candidate gene related to seed yield of vegetable soybean,
Glyma.09G102200 encodes cytochrome P450, which is a large family of self-oxidizing heme proteins. As a member of this family, CYP78A has been shown to regulate grain morphology and yield per plant in Arabidopsis [
46], soybean [
47], rice [
48], and wheat [
49]. Therefore,
Glyma.09g102300 and Glyma.09g102200 were predicted to be the candidate gene to regulate the vegetable soybean pod weight.