This review was informed by the ROBINS (Risk of Bias in Non-randomized Studies) framework, developed through expert consensus to evaluate and mitigate statistical bias in non-randomized studies (D’Andrea et al., 2021; Sterne et al., 2016). The principles from the ROBINS framework were applied to a recent meta-analysis, chosen because it represented possibly the most extensive recent meta-analysis in the field, spanning 107 observational studies that assessed the relative risk of exclusive and dual EC and CC use, across multiple categories of harm including cardiovascular disease, stroke, metabolic dysfunction, asthma, COPD (chronic obstructive pulmonary disease), and oral disease (Glantz et al., 2024).
During the preparation of this review, feedback was solicited from dozens of experts in the field (see Acknowledgements). In addition, an initial draft has been posted on the public preprint server bioRxiv, Further supporting data are available in the Supplement. The authors invite and welcome timely feedback on the pre-publication draft from anyone in the research community.
Considerations Impacting Precision and Accuracy of Observational Studies
The ROBINS framework identifies seven domains which can impact precision and accuracy of observational studies. Broadly, these sources of potential bias pertain to three groupings: definitions of population and sample, characterization of exposure history, and selection of outcomes and results, as enumerated in
Figure 1, and discussed below.
Figure 1.
Categories of factors impacting precision and accuracy of non-randomized observational studies of exposure.
Figure 1.
Categories of factors impacting precision and accuracy of non-randomized observational studies of exposure.
ROBINS-E domains aggregate into three categories, spanning characterization of population and sample, exposure history and outcomes and result. Observational studies of EC use may include one or more of the cohorts represented across the three rows: EC users who are never, former, and current (dual) users of CC. The factors here are described specifically for exposure to EC and CC, but can generalize to other tobacco and nicotine products.
What is the Risk Associated with Exclusive Product Use?
One way to minimize statistical bias in observational studies is by restricting analyses to samples that have only ever used one product to a degree sufficient to impact risk. However, significant biases may still arise if there is not a comprehensive characterization of exposure history, validation that exposure precedes harm in retrospective studies, confirmation of persistence of use state in prospective studies, and sufficiency of outcome sample size. Statistical precision and accuracy can be further validated through convergence of multiple lines of evidence, including dose-responses and consideration of counterfactuals (Munafò & Smith, 2018).
Population and Sample-Related Considerations
Sample definition
Samples termed “EC users” often include a history of former CC use, and sometimes even current CC dual-use. A recommendation in these cases is to use more precise terminology, to prevent an inaccurate perception or interpretation that there was no previous or concurrent exposure to other products. Furthermore, EC cohorts which also have former or current exposure to CC tobacco product use require additional considerations to minimize the impact of confounders, as discussed further in this paper.
The PATH Wave 6 data set provides an illustrative example. Of older (35+ years old) current exclusive EC users (past 30-day use of only EC), 91% were former CC smokers. Older current EC users who were never CC users (<100 lifetime cigarettes smoked) only represented ~80 participants in this survey of over 30,000 participants (see Figure SI-1). This highlights the challenges in accurately evaluating diseases which may manifest after decades of former CC use.
Exposure History-Related Considerations
What product was used?
Electronic cigarettes represent a heterogeneous and evolving category. Exposure to nicotine and toxicants, as well as efficacy in displacing CC, may vary dramatically across products, therefore specification of which products were used can be an important consideration in the precision and generalizability of any results. Firstly, it is important to identify which active molecule was vaped. EC use in some survey questions may encompass cannabis use, and in other cases is nicotine-free (Selya et al., 2024). Nicotine can be formulated as a free-base or as nicotine salts with potentially different abuse liability profiles. In general, earlier generations of EC utilized lower concentrations of free-base nicotine, while current generations utilize higher concentrations of nicotine salt formulations, with higher nicotine flux. There may be an inverse relationship between nicotine concentration and toxicant exposure due to nicotine titration (El Hourani et al., 2022). Newer products may be associated with increased efficacy in switching (Kasza et al., 2021). Toxicant exposure may be higher in products which are not temperature regulated, or which do not utilize quality system practices and manufacturing processes which pass FDA review. An extreme example was the phenomenon of EVALI-related harm arising from use of Vitamin E as a cost-cutting solubilizing agent by some manufacturers of FDA-unregulated THC-containing vapes (Marrocco et al., 2022). Flavorants may impact emission chemistry and toxicant profile. The impact of exposure to nicotine analogue molecules such as methylated nicotines is also not well understood (Erythropel et al., 2024). Depending on the specificity of the analysis needed, brand name descriptors, flavor profile, source of purchase, or visual confirmation can all help confirm which product was used. Lastly, “usual product” may or may not be representative of all products used, unless it is confirmed that the usual product was used exclusively.
Comprehensive history of use
Risk typically increases with repeated exposure to tobacco products. This was demonstrated in the Atherosclerosis Risk in Communities (ARIC) prospective multi-decade longitudinal study that evaluated the predictive value of CC use history metrics on subsequent cardiovascular disease (CVD) risk (Lubin, Couper, et al., 2016). This study found that:
”Pack-years remained the primary determinant of smoking-related CVD risk…smoking fewer cigarettes/day for longer duration was more deleterious than smoking more cigarettes/day for shorter duration…no single metric, cigarettes/day, smoking duration, or pack-years, fully characterizes smoking-related risks.”
Similarly, for chronic obstructive pulmonary disease (COPD), duration of smoking is more predictive of risk than cigarettes per day (Bhatt et al., 2018). In summary, for some NRSEs, it may be sufficient to characterize CC use history by either use duration or pack-years. However, depending on the analytic precision required, more granular exposure metrics may be necessary (Lubin, Albanes, et al., 2016).
For quantifying EC exposure, there is no standardized equivalent to the CC pack-years metric. Past 30-day frequency of use data may not be representative of cumulative exposure. Concerningly, most major national surveys, including BRFSS, NHIS, and NHANES, do not report past duration of EC use. Without accounting for the frequency and duration of EC use, studies are limited in their precision and accuracy of measuring the impact of EC exposure.
When available, it is instructive to consider this information when constructing EC exposures. For instance, PATH does include time of first ever- or regular-use of EC, as well as lifetime number of uses (Boakye et al., 2023). Figure 2, below, shows lifetime number of uses of EC for the following categories: ever-use, former use, current use, use some-days or every-day, use some days, and use every day. Of the ever-use sample, 50% reported 10 or fewer lifetime uses, 23% reported 11 to 99 uses, and 27% reported ever-use of 100+ times. Likewise, 58% of current some-day users reported fewer than 100 lifetime uses, while 83% of current every-day users reported 100+ lifetime uses.
Figure 2.
Lifetime Number of EC Uses, Stratified by EC Use Category.
Figure 2.
Lifetime Number of EC Uses, Stratified by EC Use Category.
PATH Wave 6 unweighted adult data is illustrated. For each cohort, the proportion of the sample which has reported lifetime uses of EC numbering 1 to 10, 11 to 50, 51 to 99, and 100+ times is color-coded.
1
Use state persistence or transitions in prospective studies
Prospective studies typically segment cohorts based on their product use state at the start of the study. Over the course of the prospective period, participants may continue their product use, stop use, and/or transition to other products. A precise assessment of exposure should capture these ongoing product use patterns, in addition to current and past use, as they may exacerbate or ameliorate risk.
Outcome and Result-Related Considerations
Timing of harm vs. exposure
Exposure can only be causal for harm events that occur after the exposure, and the exposure should be of sufficient dose and duration to plausibly drive the underlying physiological disease process. Consequently, retrospective studies should carefully evaluate the timing and duration of exposure, and relative timing at which harm events occurred. For example, PATH Wave 1 (W1, fielded in 2013-14) contains a sample of n=1,684 individuals who reported COPD ('ever in life'). Of these, 1,252 (74%) reported that the harm event occurred 4 or more years earlier, meaning that they had the outcome before EC were widely available in the marketplace.
For assessment of relative risk of exclusive use of EC vs. CC use or vs. non-use, for some databases or endpoints, EC usage data may not be of sufficient duration for precise analysis (Cummings et al., 2024).
For instance, COPD is most typically seen after 40 or more pack-years of smoking. Criteria for diagnosis of early COPD are in development, but an exemplar assessment framework still included a minimum of 10 pack-years of CC use (Curtis et al., 2024). Generalized indicators of health may be predictive of earlier chronic changes in pulmonary function, but may lack predictive accuracy (Rennard & Drummond, 2015). Consequently, potentially more sensitive and specific assessment instruments of early pulmonary health changes are in development (Shiffman et al., 2023).
Self-reported metrics and categorization of exposure and harm events
Self-reported metrics of exposure and harm are most accurate when verified through additional means. For instance, in a recent large RCT of EC and nicotine replacement therapy (NRT) for smoking cessation, 26% of self-reports of 7-day cigarette abstinence conflicted with exhaled CO measurements in the EC group, and 33% of abstinence self-reports conflicted in the NRT group (Auer et al., 2024). Likewise, self-report of diagnosis of some health conditions can be confirmed with concordance of corresponding drug prescriptions, although it is also true that some prescriptions go unfilled in real-world settings.
Quantification of risk and risk reduction
Reporting of risk involves differentiating between background risks, not associated with tobacco use, and incremental risks associated with tobacco product use. Harm reduction involves the displacement of incremental risk associated with one product with use by another product with a lesser incremental risk impact. This is described in more detail in Box 1.
BOX 1: Measures of Relative Risk and Risk Reduction.
In an observational study, non-tobacco users often represent the normative control group for odds adjustments. The non tobacco using control sample should therefore have an adjusted odds ratio (aOR) of 1.0, reflecting the normalized background risk rate due to non-tobacco use sources. These risk sources may include genetic factors, lifestyle and environmental factors such as poor diet and exercise, alcohol, cannabis and other drug use, and exposure to pollution and secondhand smoke (Martin et al., 2024).
The aOR observed in people who use tobacco products includes the baseline risk (of 1.0) from non-tobacco use sources, plus incremental excess risk due to use of the tobacco product (risk in excess of 1.0, see Supplement I, Equations SI-1 and SI-2). For example, CC use has been causally linked with increased risk of cardiovascular disease (CVD), stroke, and other respiratory diseases (Centers for Disease Control and Prevention (US) et al., 2010; Department of Health and Human Services, 2014). Relative risk is the ratio of these excess risks due to product use (Equation SI-3).
Harm reduction refers to the reduction in the excess risk due to tobacco use, when a less harmful tobacco product is used instead of a more harmful tobacco product. The equation describing harm reduction for use of EC compared to CC is derived in Equation SI-4, and an illustrative example is provided in Figure 2, below.
Figure 2.
Calculation of Harm Reduction Magnitude (Illustrative Example).NS = non-smoker / vaper; CC = combusted cigarette user; EC = electronic cigarette user.
Figure 2.
Calculation of Harm Reduction Magnitude (Illustrative Example).NS = non-smoker / vaper; CC = combusted cigarette user; EC = electronic cigarette user.
In this illustrative example, a non-tobacco user (NS) has an adjusted odds ratio (aOR) of 1.0 for a given harm, reflecting the background risk rate. If CC smoking doubles the risk of harm, a subject engaging in CC smoking would have aOR of 2.0. In other words, they would incur a risk of 1.0 from non-tobacco sources, and an additional incremental risk of 1.0 from CC smoking. Likewise, if using EC caused 30% of the incremental harm of CC smoking, a typical study subject using EC would have aOR of 1.3. This aOR would be comprised of risk of 1.0 from non-tobacco sources and 0.3 from EC use. In this example, if the subject had used EC instead of smoking CC, their aOR would be 1.3 instead of 2.0. The harm reduction associated with EC would be 70%, due to the incremental risk of 0.3 (i.e. 1.3 - 1.0) with EC vs. 1.0 for CC (i.e. 2.0 - 1.0). Note that reduction in all sources of risk is 35% for EC vs. CC users (1.3 vs. 2.0), but harm reduction is 70% (0.3 vs. 1.0 incremental risk due specifically to tobacco product use).
Verification of dose-response
Verification of a dose-response relationship between exposure and outcome is one of the most important approaches for verifying the accuracy of results. Lack of dose-response relationship should lead to examination of potential counterfactual explanations. A common dose-response comparison is between every-day, some-day, former, and never users, but as discussed previously, precision may be limited without incorporation of duration of use or cumulative exposures to all products (i.e., pack-years of CC and duration of regular use of EC, or lifetime number of EC uses).
2
Another validation opportunity, at least in studies using databases such as PATH which capture timing of initiation of regular use, could be to confirm that risk was not elevated until after exposure began. For example, in a population which uses EC, aOR of events in the time before starting EC should be 1.0 if covariate adjustments have canceled out all sources of bias. In other words, zero exposure dose should correspond to zero incremental risk.
Was there sufficient sample size and number of harm events for model validity?
Sample size sufficiency can be a challenge in NRSEs. A generally accepted rule of thumb is that the EPV ratio (events per variable, e.g. the number of hazard events per odds adjustment regression variable) should be at least 10 for linear regression models to avoid “major problems” (Peduzzi et al., 1996).
3 Issues which can arise from insufficient EPVs include bias of regression coefficients, confidence limits which don’t properly cover the data, and paradoxical associations (significance in the wrong direction). A definitive exploration is beyond the scope of this review, but there is a need to more deeply explore this issue.
Generally, the greater the demographic imbalance between samples, and the greater the magnitude of adjustments vs. the precision of the aOR effect size and confidence interval, the more closely confounding and counterfactuals should be considered, particularly if directionality of association inverts after adjustment. Transparent reporting of the number of hazard events observed in each sample in the raw data, along with the number and magnitude of adjustment variables, can help to verify EPV ratio sufficiency and accuracy of results.
Accuracy of comparisons across populations with minimal demographic overlap
Adjustments of odd ratios in the field of tobacco research typically arise from application of a linear fit in a regression model. However, linear fits may introduce confounding if variables are not independent or don’t follow linear relationships. One of the most common examples is the effect of age, where a linear fit (i.e., adjustment = m*age + b) may not be representative of effects that take years to accumulate and then manifest increasingly rapidly with age. If an EC sample is much younger than a control sample, then linear adjustments may result in an under- or overestimate of risk. Likewise, correlation between CC and EC use is higher in older adults (see Supplement I) and thus correlation of CC harm may be higher if these interdependencies are not accounted for. One mitigation strategy is to stratify age into deciles and adjust for each decile independently. While stratification may seemingly reduce analytic power by increasing the confidence interval associated with the age adjustment, it may in actuality increase accuracy of odds adjustments because of “apples to apples” comparisons.
Transparent peer review and consideration of counterfactuals
ROBINS framework authors suggest that counterfactuals always be analyzed and recommend that:
“it is very important that experts in both subject matter and epidemiological methods are included in any team evaluating a (non-randomized study). The risk of bias assessment should begin with consideration of what problems might arise, in the context of the research question, in making a causal assessment of the effect of the intervention(s) of interest on the basis of non-randomized studies.” (Sterne et al., 2016).
Likewise, peer review of completed analyses from experts spanning perspectives can help to verify and validate study precision and accuracy.
What is the Relative Risk Associated with Displacing One Product with Another?
Population and Sample-Related Considerations
When analyzing individuals who have formerly used CC and switched to EC, careful consideration to the previous issues associated with exclusive EC use hold. In addition, the impact of exposure to two sources of harm needs to be considered.
Verification of stopping CC use
Self-reported rates of stopping smoking are typically higher than biochemically verified rates. Unfortunately, while the PATH study does capture certain biomarkers, COHb is not available (indicating exposure to combustion products), and this would be a valuable future addition to the survey. Likewise, biochemical verification of cannabis and other drug use and non-use would be valuable in the PATH study.
Exposure-Related Considerations
Exposure: recency of stopping CC use
An important source of potential statistical bias is accounting for the time course with which risk falls off after the exposure ends. As shown in
Figure S3, mortality risk is elevated in smokers (left panel). After quitting smoking, risk trends downwards over a course of years and decades, with risk decrementing more slowly with older smokers, possibly due to more pack-years of harm accumulation (Cho et al., 2024; Klonizakis et al., 2022).. The right panel of the figure shows that CVD risk is markedly lower in former smokers who stopped CC more than six years previously compared to those who are more recent quitters (Farsalinos et al., 2019). Several other studies have also reported that CVD risk declines over a time course of decades after stopping smoking (Duncan et al., 2019; Lubin, Couper, et al., 2016). Likewise, for COPD, model accuracy increased when time since quitting CC was incorporated as a predictive factor (Chang et al., 2021).
As time since switching increases, this upper limit for harm reversal approaches closer to the level of never smoking. In studies of samples which had on average stopped smoking a decade or more earlier, time since stopping smoking was less predictive for CVD risk than other metrics such as cigarettes per day (Duncan et al., 2019; Nance et al., 2017).
Figure 3.
Impact of Time Since Stopping Smoking on CVD and Mortality Risk. Left panel: adapted from (Cho et al., 2024). Right panel: adapted from (Farsalinos et al., 2019).
Figure 3.
Impact of Time Since Stopping Smoking on CVD and Mortality Risk. Left panel: adapted from (Cho et al., 2024). Right panel: adapted from (Farsalinos et al., 2019).
Therefore, when calculating aOR among ‘exclusive’ EC users who formerly smoked CC, the correct OR adjustment comparator is not necessarily the typical ex-smoker, who may have quit a decade or more ago, but a cohort of ex-smokers which has quit on average as recently as the EC sample. EC switching impact should be measured against the maximum harm reversal which would be possible in that time period, for instance if quitting with traditional pharmaceutical approaches or with abstinence (“cold turkey”).
What is the Risk Associated with Dual-Use of Two Products?
When comparing dual-use vs. exclusive use of CC or EC, it is especially important to comprehensively and transparently characterize exposure to both products as there are multiple opposing confounds.
Population and Sample-Related Considerations
Sample selection considerations
In populations which include both EC and CC use history, isolation of DU, CC and EC use into separate samples can increase accuracy and precision of evaluation. due to multiple confounding factors, which include both correlations and anti-correlations between EC and CC use patterns.
Correlation of likelihood of use of EC and CC:
Common liability and switching patterns cause a positive association between ever use of CC and EC (Khouja et al., 2021; Kim & Selya, 2020). CDC data (2021 NHIS survey) showed that current EC users were 2.8x more likely than current EC non-users to be current CC users and 1.8x more likely to be former CC smokers, while 54% less likely to be never CC users (CDC, 2023). Furthermore, EC use frequency is higher among people who have a history of smoking more CC (Levy et al., 2017). (See Supplement I for more information on this topic.)
Anti-correlation of frequency of use of EC and CC:
At the same time, multiple studies have reported that frequency of CC use is negatively associated with EC use, when EC are used for quitting cigarettes (Cohen et al., 2024; Harlow et al., 2022; Kasza et al., 2024; Wang et al., 2021).
In some non-randomized observational studies, a mixed linear model approach is used to estimate risk associated with EC use and CC use as independent variables, without measuring DU risk directly (Alzahrani et al., 2018). Because of the positively and negatively correlated interactions between EC and CC use, the standard error of fit for each of these regression adjustment coefficients may be higher in the edge cases where dual-use occurs. Consequently, the impact of DU should be directly measured as an independent sample rather than extrapolated from multiplication of aOR EC * aOR CC. At a minimum, the goodness of fit and actual confidence interval for the subsample of dual users must be verified to validate the precision and accuracy of imputation in a mixed linear model.
Case-control designs should also be avoided (harm vs. no-harm cohort designs). Because of correlation of use patterns of EC and CC, along with common liability influences, it is important to stratify populations into EC, CC, and dual use samples rather than stratifying into samples that have experienced harm vs. no harm. The case study in Box 2 includes two cautionary examples of studies which stratified harm vs. no harm samples. These studies consequently reported that CC use caused no cardiovascular harm (with former CC use being protective in one study) and attributed the entirety of tobacco product use harm to EC use. (El-Shahawy et al., 2022; Gathright et al., 2020).
Exposure-Related Considerations
Measurement of exposure to two products
Biomarker studies suggest that EC use which displaces CC use causes a reduction in exposure to CC toxins (Holt et al., 2023). The counterargument has been made that each type of product presents unique risks and so DU risk is EC risk x CC risk (Alzahrani et al., 2018). For precision and accuracy in assessing the impact of DU vs. CC or EC use, it is critical to comprehensively characterize the exposure history of both products in all samples. Evaluating the dose response for each type of exposure (CC pack-years vs. EC use frequency x duration of regular use) within the DU sample in an NRSE may help to differentiate between these two hypotheses.