1. Introduction
Statistical analysis options for repeated measures designs vary depending upon the nature of the outcome variable assessed. When the outcome variable (dependent variable) is continuous and normally distributed (multivariate normal), there are more analysis options than when the dependent variable is discrete. Most students taking statistics courses in undergraduate or even some graduate programs learn little about the different statistical analysis methods available for dealing with repeated measures designs, especially when the dependent variable is discrete. The longitudinal data analysis method they are most likely to encounter is repeated measures ANOVA (RM-ANOVA). Although providing some flexibility, such as the ability assess differences among time points, and shape of change over time in a posthoc analysis (e.g., linear or quadratic), RM-ANOVA has limitations when compared to more advanced multivariate statistical methods such as Generalized Estimating Equations (GEE) and Latent Growth Curve Modeling (LGCM). Both of these methods, however, have advantages and disadvantages as well. A more recent method beginning to be employed in the assessment of repeated measures of a variable of interest is area under the curve (AUC). Researchers are now assessing its comparative effectiveness in relation to these methods.
GEE is a multivariate method that has several advantages over RM-ANOVA [
1,
2]. First, unlike RM-ANOVA, GEE permits repeated measure variables to be discrete or continuous. Second, there is no need to specify the multivariate distribution as the aim is parameter estimation instead of model testing, and GEE provides normally distributed and consistent parameter estimates with no more than specification of the correct mean structure [
3]. RM-ANOVA, by contrast, assumes the repeated measures are multivariate normally distributed [
4]. Third, GEE permits the selection of the appropriate correlation structure (i.e., the correlations among the repeated measures), and efficiency of parameter estimates does not suffer from an incorrectly specified correlation structure. RM-ANOVA assumes the correlations between measures remains constant [
5]. Further, GEE permits estimation of more complicated relations in a single model, such as when one desires to explore the relation between two repeated measures variables (e.g., the effect of change in one variable on change in a second variable). Assessing the effects of time varying covariates is not easily available in RM-ANOVA, although one can format the data to permit the use of a time varying covariate [
6].
Equation 1 presents a basic GEE cumulative logit model with three covariates (x
1 through x
3; the number of covariates employed in the present study), where the logit is the link function relating the covariates to the repeated-measures outcome variable Y
it, j is the level of the ordinal variable Y, where j ranges from 1 to J-1, and t represents time (i.e., the unit of time for the repeated measures). Equation 2 is the general cumulative logit model relating the vectors of predictor variables (X) and parameters (β), to the vector of outcomes (Y), with the mean structure being the probability of the j
th level of the ordinal outcome variable y for individual i at time t (equation 3) [
3].
While there are indeed many advantages to GEE over RM-ANOVA, limitations include the inability to test models and compare different models since GEE lacks a likelihood function. Further, the treatment of missing data is a question. With GEE, one assumes that data are missing completely at random (MCAR), meaning that the available cases are a random sample of all cases (absent missing data) [
3,
7,
8]. Although plausible, MCAR is less likely than the data being missing at random (MAR). When the data are MAR, this indicates that missingness is related to a predictor variable (or predictor variables) or prior measures of the dependent variable. This is a less-restrictive assumption. Although there are extensions that facilitate modeling with missing data in GEE that are not MCAR [
9,
10], these are less likely to be employed in a naïve analysis.
Another issue related to modeling discrete dependent variables in GEE is the use of correlation coefficients to represent the relations among the repeated measures. Simulation studies suggest that the best measure of association among repeated measures of discrete data is the local odds ratio (LOR) rather than correlation coefficients [
11,
12]. Although we did not address this limitation in our GEE analysis in this study, as our aim is to conduct a simple (naïve) analysis that most researchers would do, it is noted that there are more efficient ways to estimate associations than correlation coefficients.
Many of the restrictions encountered with GEE are not met when using LGCM with repeated measures data. LGCM is a multivariate method that employs unobserved (latent) variables to represent initial level (baseline) and rate of change from baseline (trend) [
13,
14]. Equation 4 presents the relations among the latent and observed variables in a prototypical LGCM with ordinal data [
15].
Equation 5 clarifies equation 4 with respect to three repeated measures, the number of repeated measures employed in the present study.
Working from left to right, we have a vector of ordinal outcome variables for participant i at time t, for the T=3 time points. On the right side of the equation, we have a vector of regression intercepts, set to equal zero. The Λ matrix includes the factor loadings for the intercept factor (first column) and the slope factor (second column). The intercept loadings are set to 1, indicating an unchanged relation between the intercept and the repeated measures. The unit increasing factor loadings in the second column suggest linear growth over time. The next vector includes the two factors (η; eta), the first representing the intercept factor, and the second representing a linear trend factor. If we had proposed a quadratic trend, there would have been a third factor term. For a cubic trend, there would have been a fourth factor term. The final vector represents the residual, one for each time point. These are assumed to be uncorrelated, although at times researchers may correlate them to improve model fit to the data. This is generally not recommended, however [
16,
17,
18].
The latent variables (η) underlying the observed ordinal variables are continuous, not discrete. To capture progression across increasing levels of the ordinal variable, LGCM uses thresholds (τ; tau). Thresholds are cut points in a continuum representing propensities to progress from one ordinal category to the next in an observed variable [
15,
19]. To assess change over time, these thresholds are constrained to equality across time points. As such, a change in the proportion of participants across the various levels of an ordinal variable, increasing from a lower to a higher category over time, results in an increased propensity to cross thresholds to higher levels of the dependent variable.
With respect to the latent variables, they are assumed to be multivariate normally distributed, with a mean α (alpha) and variance/covariance ψ (psi; equation 6) [
15,
19]. The residuals are assumed to normally distributed with a mean 0 and variance θ (theta; equation 7). For a very detailed presentation on LGCM with ordinal data, see the Mehta et al. reference [
15]. See also Masyn et al. reference [
19]. For a general introduction to growth modeling, see Duncan and Duncan [
13].
LGCM also permits researchers to assess the effect of potential predictor variables on baseline and trend factors. In addition, one can assess the effects of another repeated measures variable on repeated measures of the dependent variable, what is commonly known as an associative (parallel) processes LGCM [
20]. This could involve, for instance, assessment of the effects of baseline level for one process on a trend for a second process, and trend to trend effects. We represent effects on the growth factors in equation 8.
Advantages of LGCM include a likelihood function, allowing for model comparisons. In addition, one can test multiple hypotheses, exploring the effects of predictors on repeated measures of a dependent variable. Moreover, one can assess the effects of initial level and rate of change on other outcome variables, whether latent or observed. Further, like GEE, one can assess change on a variety of levels of a dependent variable, whether continuous or discrete. Another advantage of LGCM, not shared by GEE, is how it handles missing data. Unlike GEE, which assumes MCAR, LGCM assumes data are MAR. This is a far more tenable assumption. Using Full Information Maximum Likelihood (FIML) estimation for parameter estimates, LGCM uses all available data on the dependent variables to estimate parameters [
21]. Thus, it has the same sample size as GEE, but with a less-restrictive assumption.
Despite the advantages of both methods, conducting GEE and LGCM requires more than a rudimentary understanding of statistics. Moreover, although not necessarily the case for GEE, one needs specific software to conduct LGCM (e.g., M
plus). This can be an unnecessary impediment to researchers if they are interested in the assessment of longitudinal processes but not in the impact of selected covariates on rates of change, or the shape of development (e.g., linear, quadratic, or cubic). For instance, a researcher may wish to assess whether males or females differ in psychological distress (PD) over multiple timepoints, but not whether males differ from females in the rate of change in PD from baseline, or whether change in PD is linear or quadratic in nature. Further, one important limitation with LGCM and GEE alike is related to the nature of ordinal variables, the type of discrete random variable we are employing in our analysis. When modeling with an ordinal random outcome variable (Y) with k categories, one assumes proportional odds, such that the odds related to a set of predictor variables (X) remains constant when comparing different levels of the dependent variable [
22,
23,
24,
25]. This is seen in equation 1, where we predict the log odds (logit) of being in category ≤ j versus > j given a set of predictor variables (x). However, this assumption may not always hold in practice, meaning that multinomial logistic regression instead of ordinal logistic regression is the more appropriate analysis option.
One alternative method that researchers are beginning to investigate as a viable alternative for longitudinal data analysis is area under the curve (AUC). With this method, one merely calculates the area under the curve generated by the repeated measures. Equation 9 presents the calculation of area under the curve for repeated measures analysis, what is known as AUC with respect to the ground [
26,
27,
28]. Employing the trapezoid rule for calculating AUC across T timepoints, we have
If we let t
i represent our intervals x
t - x
t-1, there will be one less interval than time points (equation 10).
If the intervals are constant in length, equation 10 reduces to equation 11
Researchers have shown that AUC performs as well as LGCM when the data are continuous or discrete counts [
26,
27]. However, to our knowledge no study has yet compared AUC to GEE or LGCM using discrete ordinal random variables.
AUC has several advantages over GEE and LGCM. First, instead of multiple dependent variables to form growth curves, AUC is a single variable that can be employed in simpler statistical methods such as regression analysis, t-tests, and ANOVAs, as well as multivariate methods such as structural equation modeling (SEM). Second, one does not need advanced statistical knowledge to understand AUC, although some basic knowledge about distributions (e.g., normal or Poisson) is a plus. Further, there is no need to consider issues such as the proportional odds assumption when calculating AUC, as the primary concern is the area created by the different rectangles covering the mass or density for the T-1 intervals.
Despite these advantages, there are some key limitations. First, in calculating AUC without any modifications, individuals missing data at a given timepoint are deleted listwise, meaning the entire record for that participant is expunged, resulting in a potentially drastic reduction in sample size. As such, AUC assumes data are MCAR, and data imputation methods are necessary to reduce the number of missing cases. Second, one must write syntax to calculate AUC for repeated measures designs at this point, and this may be difficult for those with limited experience with syntax and coding. Acknowledging these limitations though, AUC may provide a useful alternative to other repeated measures statistical methods. As such, the purpose of the present study was to assess the relative efficacy of AUC compared to GEE and LGCM using real data from a publicly-available data source. Using data from the Panel Study of Income Dynamics (PSID) transition to adulthood (TA) study, and all three data analysis methods, we assessed the impact of repeated measures of a continuous predictor variable, psychological distress, on repeated measures of an ordinal dependent variable, nicotine use, both measured in 2017, 2019, and 2021.
Rational for selecting these variables. In this study, we assessed the relation between psychological distress and smoking in older adolescents and young adults (ages 18-24). Smoking is the leading cause of preventable disease and death in the United States [
29], and while combustible cigarette smoking has decreased in recent years, e-cigarette smoking (vaping) has been on the rise, and both remain serious risk factors for poor health outcomes [
30,
31]. A potential reason for smoking in older adolescents and young adults is the rising prevalence of mental health issues [
32]. Nicotine can temporarily relieve symptoms of mental illness, such as anxiety and depression, and many people suffering from these issues may use it to self-medicate, leading to nicotine dependence [
33,
34,
35,
36]. In the long term, however, smoking can increase susceptibility to anxiety and the severity of depression, exacerbating poor mental health [
36,
37]. Thus, studying the relation between psychological distress and smoking in older adolescents and young adults may help to find solutions to both problems. Finally, we also controlled for race and education level, as these are two factors that are known to affect smoking status [
38].
2. Materials and Methods
Participants and procedures. Participants were drawn from a total sample size of n=4222 young adults (18-24 years old) at each of three data collection waves (2017, 2019, and 2021) who were taking part in the Panel Study of Income Dynamics (PSID) Transition to Adulthood (TA) supplement. Started in 1968, the PSID is the longest continuously running cohort study [
39]. Although its original purpose was to understand the intergenerational transmission of poverty, subsequent data collections have included a variety of variables covering health and other experiential domains. The final sample sizes per analysis differed due to missing data and its treatment in each statistical method, with 2510 participants for the GEE, and 2511 participants for LGCM and AUC, after multiple imputation for the AUC analysis. The large number of participants missing per wave resulted from the transitional nature of this supplement. Participants age out and new participants enter the TA sample during each data collection wave. However, we were only interested in following the same participants (cohort) across the data collection waves to meet our aim. We found that the three waves selected maximized our cohort size, as adding additional waves resulted in a drastic decrease in sample size.
Instrumentation. To estimate use of nicotine across time, we generated a four-level ordinal variable from two PSID questions related to one’s nicotine use behavior (i.e., “Do you smoke cigarettes?” and “Have you ever vaped?”). Values are 0 – Does not use a nicotine product, neither combustible cigarettes(smoking) nor electronic cigarette (e – cigarette; vaping), 1 – vaped or vapes, 2 – smokes, 3 – vaped/vapes and smokes). We assessed psychological distress with a variable that is the sum of six five-point (0 through 4) Likert-style questions asking how often in the past month the respondent felt nervous, hopeless, restless, everything an effort, too sad, and worthless. Scores ranged from 0 to 24 points. Finally, we controlled for race (0 – non-White, 1 – White) and education (0 – greater than a high school education, 1 – high school education or less).
Data analysis methods. We conducted GEE, LGCM, and AUC. Our GEE analysis included four predictor variables, the time varying covariate psychological distress, the two time-invariant covariates education and race (both measured at 2017), and time (i.e., data collection wave). We also included two interaction terms, one each for the time invariant covariates; race by time and education by time. This allowed us to assess whether the effects of 2017 education and race smoking, if any, changed across time.
For the LGCM, we conducted an associated-processes model, with one process for repeated measures of psychological distress, and a second process for smoking. Each LGCM included two factors, one for intercept and one for a linear trend. In each LGCM, we controlled for the effects of the time invariant covariates, education and race, on the baseline level and linear trend factors. We also controlled for the effect of baseline level of the opposite LGCM on each trend factor (i.e., the effect of baseline smoking on psychological distress trend, and the effect of baseline psychological distress on smoking trend). Finally, we assessed the effect of psychological distress trend on smoking trend. To assess the fit of our LGCM to the data, we used chi-share test of model fit, the Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Residual (SRMR). Heuristics for good fit include a non-significant chi-square, CFI ≥ 0.95, RMSEA ≤ 0.05, and SRMR ≤ 0.06 [
40,
41,
42].
For AUC, we employed multiple regression analysis to assess the effects of psychological distress AUC on smoking AUC, controlling for race and education, both measured at 2017.
3. Discussion
The aim of the present study was to compare three multivariate methods for analyzing repeated measures data, GEE, LGCM, and AUC. The results of all three methods agreed on the positive effects of psychological distress on smoking, with LGCM being able to partition the effects between baseline level and trend. This is a key difference among the methods, and suggests that researchers interested in understanding how different facets of growth in a longitudinal process, whether initial level or rate of change, affect each other may be best served by LGCM over the other two methods. For researchers interested in overall effects, it seems that AUC provides the most reasonable choice, as it incorporates the entirety of change within a single variable, whether a predictor or an outcome variable. Nevertheless, the handling of missing data in AUC and GEE can be problematic, especially if one is not well versed in handling missing data using methods such as multiple imputation.
Even though the aim of this study was to compare performance among the three different statistical methods, the finding that psychological distress was associated with an increase in the propensity to smoking with all three methods is noteworthy. A potential reason for this robust association is that psychological distress may cause individuals to smoke as way to self-medicate for symptoms of mental illness. Based on the findings from each method, it would appear that psychological distress predisposes one to nicotine use, and that the effect persists across time. This is seen with the significant effect of baseline distress on LGCM, trend along with a non-significant effect from psychological distress trend to smoking trend. This result is mirrored by the effect of psychological distress on smoking in both GEE and AUC, two methods that essentially aggregate change into a single variable (long-format data structure in GEE, calculation in AUC).
Another interesting contrast that mirrors the effects seen with psychological distress is seen when comparing the effects of education and race on smoking in LGCM and AUC. Both results mirror what is seen with distress, as there appears to be a ceiling effect for race and education in LGCM, and a clear positive effect for the two covariates with AUC. This highlights the greater ability to partition effects with LGCM. It is notable that neither variable had a significant effect on smoking using GEE, neither on its own or as an interaction term with time. More research is needed comparing these different methods to better understand the nature of such differing effects.
Like all studies, there are several limitations to our work. First, our smoking variable was generated based on the available questions. The questions related to vaping were not as well defined as those for combustible cigarette smoking, at times failing to delineate past from current use. Second, the low prevalence of smoking in this sample may have affected the results. Third, we only controlled for two variables, education and race. However, as the aim of this study was to compare the three methods, we did not find it necessary to go beyond two control variables. Researchers interested in better understanding the relation between psychological distress and smoking should include additional control variables. These limitations noted, this study adds to recent studies that suggest that AUC is a viable alternative method for assessing longitudinal data, especially when the research aim is to understand overall effects rather than partitioning effects by baseline level and trend. Nevertheless, all three methods are useful for researchers interested in assessing change, with AUC being the most accessible in our opinion. The next steps are to develop programs to calculate AUC regardless of the number of repeated measures, develop an equation that incorporates baseline to asses change, without allowing for negative values, and adding the ability to account for missing data through multiple imputation without the researcher having to write additional code.