1. Introduction
Vitamin D is a crucial micronutrient that plays a vital role in maintaining bone health. Deficient levels of Vitamin D, indicated by a serum 25-hydroxy-vitamin D (25(OH)D) level below 20 ng/mL (50 nmol/L), are prevalent worldwide. This deficiency has been associated with musculoskeletal disorders, like osteoporosis and rickets in children, as well as cardiovascular diseases, and cancer [
1]. The elderly population is particularly susceptible, owing to factors such as decreased Vitamin D synthesis in the skin, inadequate consumption of Vitamin D-rich products, and intestinal malabsorption [
2].
Genetic predisposition can also impact Vitamin D levels in addition to the conventional risk factors [
3]. Studies have reported heritability estimates of genetic variations in 25(OH)D levels around 80%, underscoring the substantial influence of genetic factors on individual differences [
4]. Genome-wide association studies (GWAS) have successfully identified specific genetic variants associated with 25(OH)D levels in genes involved in the synthesis and transport of Vitamin D. Notable genes include
CYP2R1 (Cytochrome P450 Family 2R1),
CYP24A1 (Cytochrome P450 Family 24A1),
DHCR7 (7-Dehydrocholesterol Reductase), and
GC (Vitamin D Binding Protein) [
5,
6,
7,
8]. Furthermore, studies have also determined a contribution of genetic variants, including
CYP2R1 and
DHCR7, to the variability in response to Vitamin D supplements [
9].
Surprisingly, despite the abundant sunshine in the Middle East, there is a significant prevalence of Vitamin D deficiency, with rates reaching up to 74% [
10,
11] and approximately 63% in Lebanon [
12]. However, most of the genetic studies on Vitamin D have predominantly focused on individuals of European ancestry, leaving a knowledge gap in the impact of genetic variables in Middle Eastern populations. We performed the first whole exome-based GWAS (ExWAS) of Vitamin D in an elderly Middle Eastern population. The ExWAS approach provides an enhanced ability to detect rare variants within protein-coding genes, thus enabling a more effective analysis of the genetic architecture underlying Vitamin D deficiency in this population [
13]. To identify more novel and common genetic determinants of Vitamin D, we conducted meta-analysis of our cohort with the largest European GWAS (n = 417,580 individuals) [
6]. Additionally, we assessed the effectiveness of European-derived polygenic risk scores (PRS) along with the correlation of genetic markers and Vitamin D deficiency in the elderly Lebanese individuals.
4. Discussion
Our study represents the first known attempt to identify common and new genetic variants that are linked to Vitamin D levels in elderly individuals from the Middle East. Through a whole exome analysis of 194 elderly Lebanese individuals, we discovered essential loci with suggestive evidence of an association with Vitamin D levels. The main findings of our study deviate from previous GWAS, where the most significant associated loci are located within four major genes, namely
GC,
CYP2R1,
CYP24A1, and
DHCR7 [
26]. Our research sheds light on previously unidentified genes that potentially contribute to the process of intestinal absorption of Vitamin D. Importantly, this observation may be influenced by the specific characteristics of our cohort, which primarily consists of elderly individuals. Further validation of our results in independent populations, with an expanded sample size and utilizing a GWAS approach, would be advantageous. Indeed, such replication is the most reliable method to verify the validity of our findings.
In our study, the top genetic predisposition was the novel
MGAM locus, rs141064014, with no indication of LD. The
MGAM encodes the maltase-glucoamylase protein belonging to the glycoside hydrolase family 31 that breaks down complex carbohydrates in the small intestine [
27]. In elderly individuals, reduced
MGAM activity may exacerbate age-related changes in the digestive system, leading to reduced intestinal and hepatic enzymes, intestinal motility, and malabsorption of nutrients [
28]. Our functional analyses showed that the "C" risk allele significantly reduces the intestinal expression of
MGAM, which is highly elevated in our regional populations compared to other populations. These observations suggest a potential regulatory role in the intestinal absorption of Vitamin D in Middle Easterners, which requires further investigation.
We also identified a novel suggestive SNP, rs7036592, in the
PHF2 gene belonging to the Jumonji C family. It plays a role in various physiological processes, including the regulation of gene expression, different tissue functions, metabolism, and adipogenesis [
29]. Interestingly, there is evidence that
PHF2 might be involved in regulating Vitamin D metabolism and signalling [
30].
PHF2 has been shown to physically interact with and regulate the activity of
CYP27B1, thereby affecting the production of 1,25(OH)2D [
31]. On the other hand,
PHF2 was found to interact with and enhance the transcriptional activity of the Vitamin D receptor in osteoblasts, suggesting a potential role in regulating Vitamin D-dependent gene expression [
32]. Recent studies indicate that
PHF2 plays a crucial role in controlling the methylation pattern and subsequent expression of genes responsible for osteoblast differentiation in mice. Moreover, deleting
PHF2 in mice results in inadequate bone formation [
33]. The high expression of
PHF2 in bone tissues suggests direct links with Vitamin D in regulating osteoblast differentiation that requires further investigation.
The heritability of Vitamin D in Middle Eastern populations remains unknown and requires further investigation. Previous GWAS has approximated the heritability of 25(OH)D to be between 7.5% to 16% among Europeans [
5,
8]. We found that the SNP-based heritability of 25(OH)D in the Lebanese group surpassed the estimation observed in UK Biobank participants, with a higher estimate of approximately 29%. This observation may be due to several factors, including study design, genetic diversity, and environmental exposures, which lead to increased heritability estimates [
8]. Further research is needed to understand the underlying mechanisms driving these differences.
To ensure the validity and reliability of our findings, we examined data from the UK Biobank since the frequency and impact of alleles may differ across populations. Our analysis of this dataset enabled us to confirm several Vitamin D-related variants identified in the GCST90000616 study, indicating the consistency of our results. Nonetheless, we acknowledge that differences in the genetic backgrounds of study populations and environmental exposures may have led to variants with opposing effect sizes. In order to better understand the reasons for opposing effect sizes and insufficient statistical power, further investigation may be needed, including studies in larger and more diverse populations.
To enhance the statistical significance of our observations, we merged our findings from Lebanese samples with data from the largest European GWAS [
7]. Our meta-analysis has revealed several SNPs associated with Vitamin D levels that were not previously considered in the GWAS catalog (
Table S2). Furthermore, we have confirmed the replication of multiple SNPs from the GCST90000616 study (
Table S3), including a missense variant, rs2725405, located in the
SLC38A10 gene. This gene is responsible for regulating protein transportation, synthesis, and cellular stress responses. In some cases, SLC38A10 protein can also mediates the intestinal transportation of some Vitamins into the blood and lipid metabolism [
34]. Notably, the expression of
SLC38A10 decreases in the presence of “C” risk allele in the intestinal and skin. The enrichment of
SLC38A10 in intestinal and skin tissues and regional populations, suggesting possible mechanisms in Vitamin D absorption and metabolism. However, more research is needed to understand how it may impact Vitamin D status.
The comprehensiveness of the UK Biobank cohort provided the possibility of deriving PRS for Lebanese individuals from European populations. The effectiveness of European-derived PRS in the Lebanese population was lower than that of estimated in Europeans. This variation in performance might be due to various factors, such as differences in variant effect sizes and frequencies of causal alleles across ethnicities as well as the number of genetic variants and participants utilized in the study [
35]. This emphasizes the need for a larger genome-wide association study tailored specifically for the Middle Eastern population to improve the performance of PRS estimation. Nevertheless, our polygenic risk score model for Vitamin D was able to predict Vitamin D deficiency efficiently in the Lebanese cohort. The modest performance of the PRS in predicting Vitamin D levels in individuals of European and Lebanese descent, with R values of 0.31 and 0.2033, respectively, may suggest the influence of non-genetic factors associated with Vitamin D deficiency, such as inadequate sunlight exposure and lifestyle/environmental factors. Therefore, the development of a more effective risk score tool for Vitamin D levels may require the incorporation of such factors.
Figure 1.
Manhattan plot and Q-Q plot of the ExWAS results for Vitamin D levels. A, Manhattan plot displays chromosomal positions of genetic variants, and the –log10 P-value with a horizontal grey line represents the top signals (P-value < 1 x 10-5). The plot shows novel genomic regions on chromosome 7 and 9 that exceeds the significance threshold of Vitamin D ExWAS (n = 481,395 variants). B, Q-Q plot displays a good fit between the observed –log10 P-values and the expected –log10 P-values, indicating that the ExWAS results are not biased and are consistent with the null hypothesis.
Figure 1.
Manhattan plot and Q-Q plot of the ExWAS results for Vitamin D levels. A, Manhattan plot displays chromosomal positions of genetic variants, and the –log10 P-value with a horizontal grey line represents the top signals (P-value < 1 x 10-5). The plot shows novel genomic regions on chromosome 7 and 9 that exceeds the significance threshold of Vitamin D ExWAS (n = 481,395 variants). B, Q-Q plot displays a good fit between the observed –log10 P-values and the expected –log10 P-values, indicating that the ExWAS results are not biased and are consistent with the null hypothesis.
Figure 2.
Regional Plot for the top novel regions. LocusZoom plots of strongest correlated SNPs to Vitamin D, A, rs141064014 on chromosome 9, and B, rs7036592 on chromosome 7 (lead SNP–shown in purple diamonds). P-values in-log 10 scale, as in the Manhattan plot, are shown on the left vertical axis, the recombination rates are on the right vertical axis as a blue line, and the chromosomal positions are on the horizontal axis. The bottom panel shows the name and location of genes. Genes within the region are annotated and shown as arrows, and r2 of linkage disequilibrium relationships of each SNP with lead SNP are colored as indicated.
Figure 2.
Regional Plot for the top novel regions. LocusZoom plots of strongest correlated SNPs to Vitamin D, A, rs141064014 on chromosome 9, and B, rs7036592 on chromosome 7 (lead SNP–shown in purple diamonds). P-values in-log 10 scale, as in the Manhattan plot, are shown on the left vertical axis, the recombination rates are on the right vertical axis as a blue line, and the chromosomal positions are on the horizontal axis. The bottom panel shows the name and location of genes. Genes within the region are annotated and shown as arrows, and r2 of linkage disequilibrium relationships of each SNP with lead SNP are colored as indicated.
Figure 3.
Relationship between the genotypes of Vitamin D-associated SNPs and the gene expression enrichment. The bean plots display the normalized intron-excision ratio and their median (indicated by a white horizontal line) and interquartile range (represented by a black box) for A, MGAM rs141064014 expression in small intestine (P-value = 1.9 x 10-21), SLC38A10 rs2725405 expression in B, colon (P-value = 1.8 x 10-48), and C, small intestine (P-value = 1.5 x 10-23). The data presented is derived from the GTEx database.
Figure 3.
Relationship between the genotypes of Vitamin D-associated SNPs and the gene expression enrichment. The bean plots display the normalized intron-excision ratio and their median (indicated by a white horizontal line) and interquartile range (represented by a black box) for A, MGAM rs141064014 expression in small intestine (P-value = 1.9 x 10-21), SLC38A10 rs2725405 expression in B, colon (P-value = 1.8 x 10-48), and C, small intestine (P-value = 1.5 x 10-23). The data presented is derived from the GTEx database.
Figure 4.
European-derived polygenic risk score performance on the Lebanese population. Linear regression of baseline Vitamin D levels and polygenic risk scores (PRS) derived from a large European dataset (PGS000882: R= 0.2033, P-value = 0.0044). The blue line represents the best fit of linear regression analysis.
Figure 4.
European-derived polygenic risk score performance on the Lebanese population. Linear regression of baseline Vitamin D levels and polygenic risk scores (PRS) derived from a large European dataset (PGS000882: R= 0.2033, P-value = 0.0044). The blue line represents the best fit of linear regression analysis.
Figure 5.
Prediction of Vitamin D deficiency using European-derived PRS. Receiver Operating Characteristic (ROC) curve of the European-derived PRS on the Lebanese cohort for the prediction of Vitamin D deficiency (25(OH)D < 20 ng/mL). Area under the ROC curve (AUC) is reported in the figure.
Figure 5.
Prediction of Vitamin D deficiency using European-derived PRS. Receiver Operating Characteristic (ROC) curve of the European-derived PRS on the Lebanese cohort for the prediction of Vitamin D deficiency (25(OH)D < 20 ng/mL). Area under the ROC curve (AUC) is reported in the figure.
Table 1.
Baseline characteristics of the enrolled sample population.
Table 1.
Baseline characteristics of the enrolled sample population.
Characteristic |
Male |
Female |
Total |
Age (years) |
72.7 (±5.50) |
69.77 (±3.51) |
71.13 (±4.82) |
BMI (kg/m2) |
28.69 (±3.35) |
31.60 (±5.10) |
30.24 (±4.60) |
Serum 25(OH)D (ng/ml) |
19.36 (±6.25) |
20.83 (±7.93) |
20.12 (±7.22) |
Sample Size |
90 (46.4) |
104 (53.6) |
194 |
Abbreviations: BMI, body mass index; SD, standard deviation. |
Table 2.
Variants identified in exome-wide association analysis for 25-hydroxyVitamin D levels.
Table 2.
Variants identified in exome-wide association analysis for 25-hydroxyVitamin D levels.
SNP |
Gene |
HGVS ID |
CHR |
Position |
A1 |
A2 |
Beta |
SE (Beta) |
P-value |
rs141064014 |
MGAM |
NC_000007.13:g.141736273G>A |
7 |
141736273 |
G |
A |
-2.38 |
0.52 |
4.4E-06 |
rs7036592 |
PHF2 |
NC_000009.11:g.96425777C>T |
9 |
96425777 |
C |
T |
-0.54 |
0.12 |
8.4E-06 |
Table 3.
Allele frequency of the top Vitamin D-variants across different populations.
Table 3.
Allele frequency of the top Vitamin D-variants across different populations.
Populations |
Frequency for rs141064014 in MGAM |
Frequency for rs7036592 in PHF2 |
Frequency for rs2725405 in SLC38A10 |
Lebanese elderly population |
0.0103 |
0.2408 |
0.4845 |
European population of ALFA |
0.00794 |
0.39533 |
0.5729 |
Controls of gnomAD populations |
European |
0.00634 |
0.3829 |
0.5468 |
East Asian |
0.001 |
0.2136 |
0.3467 |
African |
0.001 |
0.2783
|
0.9234 |
All populations |
0.00553 |
0.3216 |
0.5084 |
Abbreviations: gnomAD, Genome Aggregation Database; ALFA, Allele Frequency Aggregator. |