1. Introduction
Identifying predictors for all-cause mortality is essential to improve the risk assessment in medical decision-making and elucidate the pathways leading to disease outcomes. Studies with detailed longitudinal clinical data surrounding death give the opportunity to better understand the risk factors of mortality. Metabolic biomarkers for all-cause mortality reflect multimorbidity among middle-aged and older people and not only for specific diseases [
1]. However, our understanding of metabolic changes that underlie mortality and the aging process remains incomplete.
Previous studies on all-cause mortality have focused mainly on clinical and laboratory measurements or identification of metabolic biomarkers for specific diseases and conditions, including cardiovascular diseases, type 2 diabetes, and chronic kidney disease [
2,
3,
4,
5,
6]. Three previous studies identified metabolic biomarkers for all-cause mortality by applied nuclear magnetic resonance (NMR) spectroscopy. The strength of these studies lies in their large sample sizes, which have allowed for the replication of findings in other cohorts. The limitation of these studies is that the number of metabolites measured was low, from 98 to 226 [
7,
8,
9]. The sensitivity of NMR is low compared to the Liquid Chromatography Mass Spectrometry (LC-MS/MS) method. LC-MS/MS method detects a large pool of metabolites (> 1.000), and therefore it plays a dominant role in the metabolomics field. Mass spectrometry is intrinsically a highly sensitive method for the detection, quantitation, and structure elucidation of metabolites [
10]. Wang at al. were the first to apply LC/MS approach to investigate the association of 243 metabolites with mortality in 13.512 individuals and found that higher levels of N2, N2-dimethylguanosine, pseudo uridine, N4-acetylcytidine, 4-acetamidobutanoic acid, N1-acetylspermidine, and lipids with fewer double bonds were associated with an increased risk of all-cause mortality [
3].
Previous studies trying to find metabolic biomarkers for mortality have applied conventional statistics, which have limitations due to high internal correlations, class diversity, and exposure-outcome disparities. Artificial intelligence includes several technologies of Machine Learning (ML) approach, and therefore it is well suited to mortality studies. It focuses on empirical prediction of an outcome in contrast to traditional statistical methods [
11]. Several methods, including ML tools have been applied to metabolomics to create clinical prediction models. ML methods can analyze thousands of predictors effectively by optimizing predictive performance while capturing complicated patterns in the data, including non-linear relationships. It is especially well suited to studies applying metabolomics in mortality data, as the mechanisms of action and interactions between the metabolites are biologically diverse and interconnected [
11].
We hypothesized that identifying metabolites by the LC-MS/MS platform and applying parallel, conventional statistical methods with ML tools, improve the identification of metabolites associated with all-cause mortality. Our study is the first to apply LC-MS/MS metabolomics-based method together with ML to investigate metabolites associated with all-cause mortality in a large population-based cohort including 10.197 men.
3. Discussion
Previous studies of all-cause mortality applying metabolomics approach have been heterogeneous in the size of the studies, the number of metabolites included in the studies, the platforms to measure metabolites and statistical methods. We applied the ML tools (SVM, XGBoost, logistic regression) to identify the most impactful metabolites associated with mortality, and identified 32 metabolites, 25 metabolites increasing, and 7 metabolites decreasing the risk of mortality. Twenty of these metabolites were novel, covering several metabolic pathways, lipids, amino acids, carbohydrates, xenobiotics, energy metabolism, nucleotides, endocannabinoids, and peptides. These metabolites are known to be associated with damage in the key human body systems, including cardiovascular, renal, respiratory, endocrine, and central nervous systems (
Figure 3).
When we compared our findings with the previous two large studies, we found that only one metabolite, histidine, was previously reported to be associated with decreased mortality in the study of Deelen et al. [
9], and another metabolite 3-ureidopropionate associated with increased mortality in the study of Wang et al. [
4]. The number of metabolites measured in these studies was very different. Our study included > 1000 metabolites whereas Deelen et al. study [
9] included 226 metabolites and Wang et al. study [
4] 243 metabolites.
In our study seven of the metabolites damaged multiple body systems, including three novel metabolites (3-amino-2-piperidone, C-glycosyltryptophan, 5-(galactosyl)-L-lysine), and four previously reported metabolites (N-acetylphenylalanine, homocitrulline, homoarginine, 5-hydroxyhexanoate) [
12,
13,
14,
15]. Disruptions in the ornithine cycle result in an increased abundance of 3-amino-2-piperidone (
Figure S8A) resulting in enhanced coagulation [
16]. Hypercoagulation increases the risk of myocardial infarction and stroke, pulmonary embolism, pulmonary infarction, and renal thrombosis [
17].
N-acetylphenylalanine and C-glycosyltryptophan have been associated with albuminuria [
18] and cardiovascular mortality. C-glycosyltryptophan accelerates peripheral artery disease in patients with type 2 diabetes and is associated with a decrease in kidney function, pulmonary hypertension, and impaired lung function [
19,
20]. Increased concentrations of 5-(galactosylhydroxyl)-L-lysine, a glycosylation product of hydroxylysine (
Figure S8A), have been found in patients with pulmonary artery hypertension and in patients with impaired kidney function [
20,
21].
Homocitrulline, a carbamylation product, has been reported to be associated with morbidity and mortality from chronic heart failure, coronary artery disease, and chronic kidney disease [
22,
23]. Cyanate-induced carbamylation generates homocitrulline from lysine (
Figure S8A). Elevated cyanate concentrations related to impaired kidney function and inflammation increase homocitrulline concentration
[24]. Carbamylation prevents
LDLC binding to its receptor, resulting in cholesterol accumulation, macrophage foam-cell formation, and an increased risk of coronary artery disease
[25].
Lysine can replace ornithine in the urea cycle and combine with arginine to form homoarginine (
Figure S8A). An increase in homoarginine was inversely associated with mortality in our study in agreement with the findings in the LURIC and 4D studies
[26]. Homoarginine acts as a nitric oxide precursor, enhancing endothelial function
[26]. Elevated homocitrulline and decreased homoarginine result in disruption of the lysine pathway and increases the risk of mortality
[15].
We found 22 metabolites known to impair specific body systems, seven novel metabolites contributing to coronary artery disease (9-hydroxystearate, 3-hydroxyadipate, sphinganine, lignoceroyl-SM, SM (d18:1/25:0), behenoyl dihydro-SM, suberoylcarnitine), and one previously reported metabolite caprate
[15] (
Figure S9). Hydrofluoroalkanes, 9-hydroxystearate and 3-hydroxyadipate can be incorporated into chylomicrons, which contribute to an increase of very low-density lipoprotein particles. Additionally, oxidized LDLC plays an important role in atherosclerosis by inducing monocyte chemotactic protein 1 and scavenger receptors
[27], resulting in pro-inflammatory mechanisms.
Sphinganine, a ceramide precursor (
Figure S8B), inhibits LDLC esterification and contributes to the accumulation of free cholesterol in perinuclear vesicles resulting in cellular toxicity and death
[28]. Cholesterol accumulation releases proteases, cytokines, and prothrombotic molecules, contributing to plaque instability, rupture, and vascular occlusion
[29]. Three sphingomyelins (lignoceroyl-sphingomyeline, sphingomyeline (d18:1/25:0), behenoyl dihydro-sphingomyeline) were associated with decreased all-cause mortality in our study. Sphingomyelins are crucial for cell membrane structure, and they prevent the deleterious effects of ceramides on endothelial dysfunction, cell apoptosis, and atherosclerosis
[30].
Suberoylcarnitine, a medium-chain dicarboxylic acylcarnitine, increases the risk of coronary artery disease attributable to altered mitochondrial fatty acid oxidation and omega-oxidation
[31]. Caprate, a saturated fatty acid, has been reported to be associated with increased mortality
[32]. Saturated fatty acids increase coagulation, inflammation, insulin resistance, and the risk of type 2 diabetes, cardiovascular diseases, cancer, frailty, and all-cause mortality
[33].
We found two metabolites linked to the cardiovascular system, one novel association with 5-hydroxymethyl-2-furoylcarnitine and one previously reported association with malate
[12] (
Figure S9). 5-hydroxymethyl-2-furoylcarnitine, a dietary component has been associated with ischemic heart disease
[34]. Two metabolites in our study impair the renal system (
Figure S10), one novel association with hydroxyasparagine and one previously reported association with 3-ureidopropionate (3-UPA)
[22]. 3-UPA (
Figure S8C), increases mortality independently of kidney disease in patients with liver cirrhosis
[35].
We confirmed that N-acetylcarnosine and histidine decreased the risk of mortality
[26,36]. N-acetylcarnosine and histidine are carnosine metabolites (
Figure S8C) known for their antioxidative properties
[37]. These metabolites effectively inhibit glucose-induced oxidation and glycation in human LDL, countering aging-related changes in protein oxidation, glycation, advanced glycation end-products (AGEs) formation
[38].
We discovered two novel metabolites linked to respiratory system damage, 1-methyl-4-imidazoleacetate and 2-hydroxyfluorene sulfate (
Figure S11). 1-methyl-4-imidazoleacetate is the main histamine metabolite (
Figure S8C) and increases significantly during asthma attacks
[39]. Tobacco smoking increases the concentration of 2-hydroxyfluorene sulfate, which is a potent carcinogen in tobacco
[40]. We identified a novel metabolite oleoylethanolamide, an important metabolite impacting the central nervous system (
Figure S6). Oleoylethanolamide induces anorexia by stimulating vagal sensory nerves and activating PPAR-alpha
[41]. Anorexia is associated with an elevated risk of all-cause mortality
[42].
We found three novel metabolites impacting the endocrine system, S- and R-3-hydroxybutyrylcarnitine (S-3HB and R-3HB) and mannose (
Figure S10) confirming previously reported association with N-acetylglucosamine
[43]. R-3HB-carnitine contributes to insulin resistance in mice and can cause hypoketotic-hypoglycemia, metabolic acidosis, hyperammonemia, and fatty liver disease
[44]. Mannose glycates proteins and enhances the formation of AGEs in several diseases, including diabetic nephropathy, atherosclerosis, and neurodegenerative diseases
[45]. N-acetylglucosamine/N-acetylgalactosamine generates GlycA, which is associated with cardiovascular diseases and diabetes
[46].
We found that the metabolite signatures regulating short-term, intermediate-term and long-term mortality were very different. Particularly notable is the difference between short-term and long-term mortality, as only three metabolites were shared in these groups. Metabolites associated with short-term mortality reflect acute stress and energy metabolism. N1-methyladenosine is required for RNA methylation and rapid cellular stress adaptation [
47]. Lactate and succinate are involved in acute stress responses and fast metabolic energy [
48]. Succinate, a key metabolite in the Krebs cycle, activates hypoxia signaling [
49] whereas the metabolites associated with long-term mortality, such as dehydroepiandrosterone sulfate (DHEA-S) and beta-cryptoxanthin, regulate chronic inflammation and oxidative stress. A decrease in DHEA-S concentration increases inflammation and has an impact on long-term health [
50]. Beta-cryptoxanthin has antioxidant effects and is protective against oxidative stress [
51].
The main causes of death were cancers (28%) and cardiovascular diseases (25%) in our study. Interestingly, we did not find any metabolite associated with the risk of cancer but instead, 13 metabolites were associated with cardiovascular diseases (myocardial infarction, coronary artery disease, heart failure, pulmonary artery hypertension). This gives an excellent possibility to use these metabolites as markers for the risk of cardiovascular diseases.
In summary, ML successfully identified a precise set of metabolites associated with an increased risk of all-cause mortality, emphasizing the significant role of metabolism in aging and different diseases. Most of the 32 metabolites we discovered were novel and regulated coagulation, cytokine release, lipid oxidation, inflammation, cellular toxicity, insulin resistance, urea and malate-aspartate cycle dysregulation, and especially the risk of cardiovascular diseases. Many of these metabolites can simultaneously harm multiple body systems (
Figure S12). These metabolites offer a more accurate representation of general health compared to traditional clinical parameters and laboratory measurements.
Our study has several strengths, including a large METSIM cohort, a validated metabolomics platform and > 1000 metabolites, a long follow-up time, several novel findings, and robust data analysis. Although ML methods, logistic regression, Welch’s test, and XGBoost offer a comprehensive approach identifying metabolites associated with all-cause mortality they also have weaknesses. The limitation of our study is that it included middle-aged and elderly Finnish men, and the applicability of our findings to women, other age groups, and diverse ethnicities remains to be investigated. Additionally, our findings do not show causality between metabolites and different diseases. In summary, our study identified potentially important metabolites and metabolic pathways for future research to reveal mechanisms leading to mortality.