1. Introduction
During the COVID-19 pandemic, millions of people tested positive for the SARS-CoV-2 virus, hundreds of thousands of patients were hospitalized, and in the middle of year 2020, the medical and academic community were already noting Long COVID as an emerging issue in rehabilitation [
1,
2,
3]. Long Covid or post-acute sequelae of SARS-CoV-2 infection (PASC) is a heterogeneous entity with possible long-term symptoms in multiple organ systems, including respiratory, cardiovascular, neurological, and metabolic systems, among others [
4,
5,
6,
7]. Zang and colleagues aimed to characterize PASC using Electronic Health Records (EHR) by comparing ~58 thousand patient data to ~503 thousand control EHR data. They found a significantly increased risk for new-onset diagnoses in COVID-19 patients in 12 months after the acute disease phase when compared to non-COVID-19 patients, including myopathy, dementia, cognitive problems, skin symptoms, pulmonary fibrosis, dyspnea, pulmonary embolism, thromboembolism, diabetes, cystitis, malaise and fatigue, dizziness, joint pain, and others [
4].
Studies worldwide have estimated a high prevalence of long-lasting complications after the COVID-19 acute phase [
6,
7,
8]. A systematic review that included 194 studies (735 006 participants) assessed that 45% of COVID-19 survivors, regardless of hospitalization status, were experiencing unresolved complications for ~4 months [
6]. The high prevalence and unclear disease mechanisms lead to situations where healthcare systems are ill-equipped to deal with these patients without clear evidence-based rehabilitation and therapeutic guidelines for PASC patients.
In patients with severe COVID-19, immune dysregulation is significant, and the relationship between metabolic regulation and immune response is of great interest in determining the pathophysiological mechanisms of PASC [
9,
10]. Quantitative omics approaches, including metabolomics, have been used to understand metabolite profile differences between COVID-19 severity groups and controls, but most of them do not evaluate middle and long-term metabolic changes [
11,
12,
13]. The dysregulation of pathways related to energy production and amino acid metabolism has been observed in several studies [
11,
13]. Changes in blood metabolite levels reflect more complex systemic disturbances induced by SARS-CoV-2, which affect the liver, kidneys, and other tissues. Several studies have presented evidence of plausible metabolic reprogramming of the urea cycle and/or the TCA cycle [
11,
13,
14]. COVID-19 is associated with the development of cardiovascular diseases, including dyslipidemia and, specifically, high-density lipoprotein (HDL) dysregulation [
15,
16]. Another proof of disturbed energy metabolism is reports of increased risk of incident diabetes and incident antihyperglycemic use [
4,
17].
Considering the cardiometabolic nature of PASC, we aimed to characterize the metabolomic footprint of recovering severe COVID-19 patients in three consecutive time points and compare the metabolite levels to controls. We used NMR-based metabolomics to investigate metabolic changes during recovery from severe COVID-19. The blood plasma from hospitalized COVID-19 patients was collected at three timepoints: acute phase, a month later, and three to four months later to study metabolite profiles during recovery from severe COVID-19. To see how these profiles change compared to non-covid samples, we also measured metabolite profiles for population controls (not healthy individuals because of the relatively high prevalence of comorbidities in COVID-19 cases). Our study is designed to address these questions: how do metabolite profiles change during recovery from severe COVID-19 (linear regression for time-series data)? Can we identify significantly changed metabolites for patients with PASC and without (time series + phenotype linear regression)? What metabolites significantly differ between COVID-19 patients in each timepoint and controls (t-test, fold-change analysis)? What are the significant pathways involved in recovery from severe COVID-19?
2. Results
Primarily, we wanted to identify how metabolite profiles for COVID-19 patients change over time during the recovery process after severe disease and hospitalization. We performed limma
linear regression for time series data, using
Subject as a covariate to adjust for repeated within-subject samples, identifying 115 significant metabolites (p-value<0.05) (
Supplementary Table 1B) across all contrasts (Timepoint C as reference group), top50 are visualized in
Figure 1C heatmap. As expected, the largest number (114) of significantly different (p-value<0.05) metabolites were between the Acute phase (timepoint A) and the latest Recovery phase (timepoint C), similarly between the Acute phase (timepoint A) and the Recovery phase (timepoint B) 104 significant metabolites (p-value<0.05) were identified. The severe acute phase of COVID-19 patients can explain the significant difference in these contrasts. The top 10 most significantly changed metabolites between the Acute phase and Recovery phase (timepoint B) were GlycB, Glyc, GlycA, Glyc/SPC, Threonine, Pyruvic acid, Apo-A1, TPA1, H3FC, HDA1. Interestingly, between the Recovery phase time points B and C, we identified 17 statistically significant metabolites, top 10 - Methionine, Histidine, GlycB, Acetic Acid, Pyruvic acid, Glyc, Citric acid, GlycA, Threonine, V4TH, indicating that the recovery process is still ongoing month (35 ± 14.7.65 days) after hospitalization. See
Figure 1B Venn diagram showing the significant metabolite count across three contrasts. Also, Hotelling’s T-squared (Hotelling-T^2) test was performed to detect changes or differences in the multivariate mean of the data over time and to determine if the overall pattern or distribution of these variables changes significantly across the studied time points (
Supplementary Table 1C), the top Top 3 (by Hotelling-T^2 value) are visualized
Figure 1D, they are GlycA, Glyc, GlycB
Table 1.
Characteristics of the study participants.
Table 1.
Characteristics of the study participants.
Variable, mean (SD) or n (%) |
COVID-19 patients |
Population controls |
Male/ Female (n, %) |
17 (41.46%)/ 24 (58.54%) |
17 (41.46%)/ 24 (58.54%) |
Age, average ± SD (years ± SD) |
56.63 ± 13.16 |
53.10 ± 11.41 |
BML, average (kg/m2 ± SD) |
34.40 ± 22.95 |
27.08 ± 4.60 |
Time in Hospital (days ± SD) |
9.18 ± 3.25 |
- |
Smoker/ non-smoker (n, %) |
3 (7.32%)/ 38 (92.68%) |
3 (7.32%)/ 38 (92.68%) |
|
Comorbidities* |
|
Number of patients with comorbidities (yes/no) |
30 (73.17%)/ 11 (26.83%) |
23 (56.10%)/ 18 (43.90%) |
Hypertension (n, %) |
17 (41.46%) |
9 (21.95%) |
Type 2 Diabetes Mellitus (n, %) |
3 (7.32%) |
3 (7.32%) |
Other cardiovascular disease (n, %) |
10 (24.39%) |
13 (31.71%) |
Oncological (n, %) |
2 (4.88%) |
3 (7.32%) |
Clinical measurements ** |
Average (SD) |
Acute COVID-19 |
Recovery phase(1 month)
|
Recovery phase(3-4 months)
|
Leukocytes (μL) |
5.71 (2.15) |
6.19 (1.37) |
5.67 (1.67) |
Hemoglobin (g/dL) |
13.24 (1.45) |
137.87 (11.20) |
137.88 (28.15) |
Hematocrit (%) |
39.91 (4.07) |
41.47 (3.08) |
41.26 (8.05) |
Platelets (μL) |
173.13 (99.07) |
267.10 (49.20) |
238.17 (59.95) |
Neutrophils (μL) |
1.51 (1.50) |
52.48 (9.58) |
50.92 (13.72) |
Lymphocytes (μL) |
4.86 (12.89) |
32.74 (10.36) |
32.54 (12.67) |
Monocytes (μL) |
27.99 (87.84) |
9.83 (2.42) |
8.42 (2.21) |
Eosinophils (μL) |
0.03 (0.05) |
3.03 (1.70) |
3.05 (1.62) |
ALT (U/l) |
23.82 (18.49) |
39.23 (28.53) |
30.95 (17.54) |
AST (U/l) |
28.75 (13.74) |
28.45 (13.68) |
26.75 (12.91) |
GGT (U/l) |
78.33 (99.33) |
44.43 (43.10) |
28.31 (30.13) |
Bilirubin (μmol/l) |
6.64 (3.22) |
13.30 (5.31) |
11.26 (4.60) |
LDH (U/L) |
295.00 (168.87) |
215.70 (38.10) |
189.67 (71.51) |
Creatinine ((μmol/L) |
72.45 (19.87) |
68.65 (11.53) |
70.57 (18.49) |
CRP (mg/L) |
35.05 (42.05) |
3.69 (2.80) |
3.39 (4.28) |
D-dimer (mg/ml) |
0.60 (0.12) |
0.48 (0.31) |
0.25 (0.15) |
We recognize that our two subgroups, Long Covid and Recovered, each consist of relatively small sample sizes (31 and 10 patients (x3 timepoints), respectively). However, the strength of our investigation lies in the longitudinal nature, which offers unique insights into the trajectory of COVID-19 patients over time. Metabolome data in blood samples collected at three distinct timepoints allows us to track the disease’s progression and recovery in patients after the acute phase. Therefore, we used the COVID-19 subgroups to test whether there is a significant difference between the Long Covid (n=31) and Recovered (n=10) groups. We performed limma linear regression analysis (Metaboanalyst [
18]) with three experimental factors (Phenotype, Subject, and Time) and compared the phenotype groups with Time and Subject as covariates. When phenotype groups were compared, 27 metabolites were identified as significant (p-adjusted<0.05) (
Supplementary Table 1D). The top 10 most significant by adjusted p-value are H2CH, H3FC, H2A2, H3A2, Choline, H2PL, H3CH, Succinic acid, Acetone, H3A1, Glucose.
Next, we set out to identify the significantly different metabolites between COVID-19 patients and population controls in each of the timepoints (A, B, C) and determine whether we see consistent changes across all contrasts that are involved in the pathogenesis of post-acute sequelae of SARS-CoV-2 infection(PASC)/ recovery from acute COVID-19, and identify the significant pathways. In contrast
COVID-19 Acute phase (timepoint A) vs Population Controls, we performed a t-test and identified 116 statistically significant metabolites (FDR<0.05) (
Supplementary Table 2A) and calculated Fold Change(FC) (
Supplementary Table 2B), 44 metabolites were significant with FC>1.5. The summary of both analyses is visualized in the Volcano plot (
Figure 2A), resulting in 13 strongly significant metabolites Fold Change treshold=2 & FDR<0,05., 3 of them downregulated (Acetic acid, Methionine, Proline), 10 upregulated (Threonine, Pyruvic acid, V5FC, Ornithine, V5CH, Acetoacetic acid, 2-Oxoglutaric acid, Sacrosine, Ethanol, 2-Hydroxybutyric acid).
Figure 2B shows pathway significance (obtained by Global Test pathway enrichment analysis) and pathway impact values (from pathway topology analysis) (
Supplementary Table 2C). Pathway analysis in this contrast revealed 32 significantly enriched pathways (FDR < 0.05), including (top10 by Impact values) Phenylalanine, tyrosine and tryptophan biosynthesis, Synthesis and degradation of ketone bodies, D-Glutamine and D-glutamate metabolism, Glycine, serine and threonine metabolism, Alanine, aspartate and glutamate metabolism, Phenylalanine metabolism, Arginine and proline metabolism, Pyruvate metabolism, Glycerolipid metabolism, Citrate cycle (TCA cycle). We also identified some of these pathways as significant in the acute phase vs. controls in our previous targeted metabolomics experiment (LC-MS) [
12]. Using liquid chromatography-mass spectrometry (LC-MS) in blood sera, we identified tryptophan (tryptophan, kynurenine, and 3-hydroxy-DL-kynurenine) and arginine (citrulline and ornithine) metabolism, glutamine depletion, altered metabolism of several amino acids as contributing pathways in the immune response to SARS-CoV-2 in severe COVID-19 [
12].
In the present study, we explored how Acute phase (timepoint A) metabolites correlate with hematological and biochemical analyses performed in a clinical lab in the recovery phase (Timepoint B and Timepoint C). Interestingly, the highest correlations (by R values) in Timepoint A metabolites vs Timepoint B Blood test contrast were Ornithine and AST (0,85), Proline and alanine aminotransferase ALT (0,83), Asparagine and C-reactive protein (CRP) (0,72). Interestingly, the highest significant correlations in Timepoint A metabolites vs Timepoint C Blood test were Tyrosine & D-dimer (-0,98), Glycerol & ALT (0,83), Ornithine & D-dimer (0,79).
Figure 2 shows Timepoint A metabolite correlations with blood analysis results at Timepoints B (2C) and C (2D), correlation coefficients for the significant comparisons can be found in
supplementary Table 2D,E.
In contrast: the
COVID-19 Recovery phase/month post-hospitalization (timepoint B) vs Population Controls t-test identified 21 statistically significant metabolites (FDR<0.05) (
Supplementary Table 3a) and calculated Fold Change(FC) (
Supplementary Table 3b), 44 metabolites were FC>1.5. The summary of both analyses is visualized in the Volcano plot (
Figure 3A), Acetic acid, V5FC, V5CH, V5PL, and Sacrosine were significantly different (FDR<0.05) in the analyzed contrast with Fold Change ± 1.5.
Figure 3B shows pathway significance (obtained by Global Test pathway enrichment analysis) and pathway impact values (from pathway topology analysis) (
Supplementary Table 3c). Pathway analysis in this contrast revealed three significantly enriched pathways (FDR < 0.05): beta-alanine metabolism, Glycolysis/Gluconeogenesis, and Pyruvate metabolism.
Figure 3C visualizes the results of correlation analysis, all statistically significant R values present color between the pair. We identified three correlations as the strongest (by R-value): Succinic acid and Erythrocytes (0.89), Succinic acid and GFR (0.89), Succinic acid and Hemoglobin (0.86).
The same analysis strategy was performed in the COVID-19
Recovery phase 3-4 months post-hospitalization (timepoint C) vs Population Controls contrast, summarised in
Figure 4A–C. T-test identified 10 statistically significant metabolites (FDR<0.05) (
Supplementary Table 4A), and 4 metabolites were FC>1.5 (
Supplementary Table 4B). Volcano plot (
Figure 4A) shows two metabolites (V5FC, Ornithine) as significantly different (FDR<0.05, Fold Change ± 1.5) in the analyzed contrast. Interestingly, pathway analysis (
Figure 4B) in this contrast revealed 16 significantly enriched pathways (FDR < 0.05) compared to only 3 in TimepointB vs Population controls. In Acute phase (timepoint C) vs Population Controls, the top three pathways by impact values (among statistically significant) are: D-glutamine and D-glutamate metabolism, Alanine, aspartate and glutamate metabolism, Glycine, serine and threonine metabolism.
Figure 4C heatmap visualizes the results of correlation analysis between Timepoint C metabolites and Timepoint C blood test, all statistically significant R values present color between pair. We identified these three correlations as strongest (by R-value): Cholesterol-Cholesterol (0,91), Triglycerides – Triglycerides (TG) (0,89), HDL-Cholesterol-HDL-Cholesterol (0,86) which are the same molecules only measured with different methods and labs, but next strong R values identified are: Apo-B100- Cholesterol (0,83), Isoleucine-D-dimer (-0,81), Apo-B100- Non-HDL cholesterol (0,80) (
Supplementary Table 4D).
4. Materials and Methods
Study design. Our study cohort consists of 41 hospitalized COVID-19 patients from whom blood was collected at three timepoints: (1) Acute phase (Timepoint A) samples were collected on 1st or 2nd day of hospitalization, (2) Recovery phase (Timepoint B) samples were collected 35 ± 14.7.65 days later, and (3) later Recovery phase (Timepoint C) samples were collected 99 ± 16.79 days after the first sample. Hematological and biochemical analyses in the acute phase were performed at the hospital’s clinical lab, but during recovery in a certified clinical laboratory in an ambulatory setting. Health Registry data was obtained for the patients, and they were divided into two subgroups (Recovered (n=10)/Long Covid (n=31), by these criteria: if they have new post-acute sequelae of SARS-CoV-2 infection (PASC) diagnosis in 12 months following COVID-19 acute phase (1-month acute phase, 12 month period as a post-acute period). We used the PASC diagnosis selection from the recently published work of Zang et al [
4]. We also compared COVID-19 patients to population controls (n=41) without acute infection at sample collection. All samples were collected during the same period (years 2020-2021). The study design is visualized in
Figure 1A, but cohort characteristics are summarised in
Table 1. Written informed consent was obtained from every participant before their inclusion in the study, and the study protocol was approved by the Central Medical Ethics Committee of Latvia (No. 01-29.1.2/928).
Sample preparation and instrumental analysis. The blood samples were collected in EDTA blood collection tubes and centrifuged to separate blood plasma, it was aliquoted, frozen, and stored according to standard operating procedures SOPs of the Genome Database of Latvian Population (National Biobank [
40]). The raw NMR spectrum in thawed blood plasma samples was recorded using the Bruker IVDr (B.I.) method package, following SOPs. Together, three modules of analytes were measured: B.I. QUANT-PS™, B.I. LISA™, B.I. BioBank QC™ using Bruker AVNEO 600 MHz IVDr NMR-Solution (Bruker BioSpin GmbH, Ettlingen, Germany). The annotation and quantification of serum spectra were provided automatically via Bruker IVDr Quantification in Plasma/Serum, B.I. Quant-PS™ analysis package (Bruker BioSpin GmbH, Ettlingen, Germany).
Statistical analysis. For statistical analysis of metabolites, we used a merged table of all quantified metabolites, including free metabolites (no protein denaturing done) across different chemical classes (Alcohols and derivatives, Amines and derivatives, Amino acids and derivatives, Carboxylic acids, Essential nutrients, Keto acids and derivatives, Sugars and derivatives, Sulfones, Technical additives), N-acetylated- glycoproteins (GlycA, GlycB), Glyc (N-acetylneuraminic acid), SPC (supramolecular phospholipids composite), quantified lipids and lipoproteins (Triglycerides, Cholesterol, LDL-Chol, HDL-Chol, LDL-Phos, HDL-Phos, Apo-A1, Apo-B100) and concentrations of lipoprotein VLDL, IDL, LDL HDL classes and subclasses. We also included biologically relevant proportion calculations: Apo-B100/Apo-A1, Glyc/SPC, LDL-cholesterol (LDCH) / HDL-cholesterol (HDCH) as possible biomarkers across contrasts. Together, we analyzed 169 quantified features for 164 samples. See quantified input features for all samples in
Supplementary Table 1A.
For statistical analysis, we used web-based Metaboanalyst software 5.0 [
18]. To normalize metabolite data, we applied log transformation (base 10) and Pareto scaling (mean-centered and divided by the square root of the standard deviation of each variable).
Linear Regression for Time-Series data analysis in COVID-19 patients. The underlying method is based on limma [
41]. We analyzed two models: (1) time series data and (2) Time series + Phenotype (Recovered/Long Covid). For the (1) model, the reference group was set to be Timepoint C when compared to Timepoints A and B, but Timepoint B as a reference group to compare with Timepoint A; for the (2) model the direction of comparison was Long Covid vs Recovered. In both analyses,
Subject, indicating one patient, was used as a covariate to adjust for repeated within-subject samples. For time-pattern analysis, we used multivariate empirical Bayes statistical time-series analysis (MEBA) at Metaboanalyst, a method to rank the features by calculating Hotelling’s T
2. It can be applied to determine if these variables’ overall pattern or distribution changes significantly across timepoints.
Univariate analysis for two group comparisons. To identify the significantly different metabolites between COVID-19 patients and population controls in each of the timepoints (A, B, C) and determine whether we see consistent changes across all contrasts that are involved in the pathogenesis of/ recovery from acute COVID-19, we also performed the univariate analysis in each of the interesting contrasts with Metaboanalyst: t-test to determine significance and fold change analysis calculation to determine the level of changed metabolite concentration, to visualize the significance and fold change analysis results in volcano plot -log10(p-value) and log2FoldChange were calculated in Metaboanalyst and visualized with Prism Graphpad 9.
Pathway analysis. We also used Metaboanalyst to determine significant pathways in each of the contrasts, for that, a set of features (quantified metabolites) was used (only the ones that could be annotated). Log transformation, Pareto scaling was applied for data normalization. The reference pathway library was selected for Homo Sapiens (KEGG). The enrichment method (Global Test) was used to calculate adjusted p-values and -log10(p-value) for the plot. the reference pathway library was selected Homo Sapiens (KEGG).
Biochemical analysis and metabolite correlations. We also analyzed correlations of the metabolites with the hematological and biochemical markers from laboratory analyses at different timepoints. In each pairwise correlation, we excluded any patient if the respective laboratory analysis was not performed or the metabolite/marker was below detection level, and the correlation coefficient was only calculated if data were available for at least five patients. The calculation was done using the non-parametric Kendall’s Tau.