Search | Preprints.org

Preprint ESSAY | doi:10.20944/preprints202210.0085.v1

Using Directed Acyclic Graphs (Dags) to Represent the Data Generating Mechanisms of Disease and Healthcare Pathways: A Guide for Educators, Students, Practitioners and Researchers

George Ellison

Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Directed Acyclic Graph; DAG; confounding; collider bias; epistemology; inferential statistics

Online: 8 October 2022 (02:59:34 CEST)

Show abstract| Download PDF| Share

Directed acyclic graphs (DAGs) are nonparametric causal path diagrams that have substantial utility as principled representations of disease and healthcare pathways, and of the underlying ‘data generating mechanisms’ these pathways involve. As such, DAGs provide a valuable bridge between: the aetiological knowledge, operational insight and professional experience on which clinical training and practice depend; and the more abstract epistemological and analytical considerations required to extract robust statistical insight from health and healthcare data. DAGs are nonetheless vulnerable to imperfect biomedical paradigms, partial clinical knowledge and limited empirical data. DAGs drawn under such circumstances offer limited scope for statistical insight free from cognitive, analytical or inferential bias if: they misrepresent the data generating mechanisms involved; or ignore the important role that omitted variables (whether measured, unmeasured or unacknowledged) might play therein. To address these weaknesses and broaden the appeal and application of DAGs, this chapter provides ten simple steps that educators can use to improve the analytical competence and statistical confidence of the healthcare students, qualified practitioners and experienced researchers they support. These steps use temporal logic to draw DAGs so as to: reduce reliance on uncertain knowledge, incomplete information, flawed assumptions or guesswork; and avoid, mitigate or acknowledge the errors and biases that each of these incur. The chapter comprises an accessible, non-technical overview of the perspective and thoughtfulness required to generate temporally coherent DAGs as objective representations of the probabilistic causal paths involved in context-specific data generating mechanisms. It encourages a focus on those variables operating as potential sources of analytical or inferential bias when estimating the plausible, probabilistic causal relationship between two pre-specified variables; and specifically addresses the challenges posed by: omitted; time-variant; non-asynchronous; and temporally obscure variables. The chapter includes a worked example based on a published clinical study to demonstrate how each of the steps required to generate temporally-informed DAGs can be applied to: critically appraise the analytical decisions made during applied healthcare research; and inform the decisions required when designing, undertaking and analysing primary and secondary, prospective and retrospective research. The appendices include a summary of ten recommendations for improving the reporting and interrogability of DAGs and DAG-informed analyses.

Preprint ESSAY | doi:10.20944/preprints202210.0084.v1

The Strengths and Weaknesses of Directed Acyclic Graphs (Dags) as Cognitive, Analytical and Educational Tools for Medical Statistics

George Ellison

Subject: Medicine And Pharmacology, Other Keywords: directed acyclic graph; DAG; causal inference; bias; inferential statistics; reproducibility

Online: 8 October 2022 (02:57:44 CEST)

Show abstract| Download PDF| Share

The origins of directed acyclic graphs (DAGs) date back to the emergence of ‘graph theory’ in the early 1700s (Biggs et al. 1986). DAGs are conceptual or literal, diagrammatic representations of causal paths between variables which are constructed – as their name suggests – on the basis of two over-riding principles: first, that all causal paths are ‘directed’ (i.e. for each pair of variables, only one can represent the cause, while the other must be its consequence); and second, that no direct cyclical paths, or indirect cyclical pathways (comprising sequences of consecutive paths) are allowed, such that no consequence can be considered its own direct or indirect cause (hence ‘acyclic’; Law et al., 2012). As such DAGs reflect the knowledge, presumptions, assumptions and/or speculation of the analyst(s) concerned regarding the causal relationships between each of the variables included therein. Current convention dictates that variables are represented as nodes/vertices, and that any causal paths between variables are represented as directed arcs/edges/lines, often in the form of arrows (see Figure 1). Although each arc indicates the presence and direction of a known/presumed/assumed/speculative causal relationship between the two variables concerned, drawing an arc does not require the sign, magnitude, precision or shape of the relationship to be known or declared (Tennant et al., 2021). In this respect, DAGs provide a simple, uncomplicated, accessible and entirely nonparametric approach for postulating causal relationships amongst any variables of interest even when these are uncertain, unknown or entirely speculative (Ellison, 2020). Nonetheless, as a result of the parametric constraints imposed by the presence/absence of possible arcs within any given DAG, these also reflect and support a number of more sophisticated statistical applications which make it possible to use DAGs to inform the design of multivariable statistical models that reflect the causal structure(s) involved – albeit without the need to know or understand the mathematical technicalities on which these are based (Lewis and Kuerbis, 2016). These features make DAGs attractive cognitive, educational and analytical tools for strengthening the epistemological, theoretical and empirical basis of causal inference, and there has been a recent proliferation in the use of DAGs across a range of applied scientific disciplines (e.g. Knight and Winship, 2013), and an associated upsurge in analytical methods training (e.g. Elwert, 2011; Gilthorpe, 2017; Hernán 2018; Roy, 2021; Hünermund, 2021). This Chapter reflects on a decade of delivering medical statistics training to undergraduate medical students at the University of Leeds between 2012-2021 in which the third year research, evaluation and special studies module (‘RESS3’) has used DAGs to support the development of applied statistical skills relevant to the extended student-selected research and evaluation projects (ESREP) students undertake in their fourth and final years (Ellison, 2021; Ellison et al., 2014a,b). Based on successive iterations of the structure and content of the RESS3 module, together with notes made during formal and informal planning and review meetings with module leads, lecturers, tutors and students, we draw on the claims and criticisms made of DAGs in the epidemiological literature to identify a number of explicit strengths (and associated, often implicit. weaknesses) that are central to their use in prediction and causal inference modelling. While using DAGs requires (and benefits from) a clear understanding of their non-parametric nature and parametric implications, the weaknesses of DAGs seem likely to reflect both: the challenges inherent in the modelling of data generating processes when these are imperfectly understood; and troublesome cognitive and heuristic tendencies common to all analytical tools – in which the tool facilitates the task in hand by reducing the necessity (and benefits of) exploring uncertainties and identifying assumptions. These, more epistemological considerations appear particularly challenging for medical undergraduates to grasp (Ellison, 2021), but also appear poorly understood by many established analysts and clinical epidemiologists (Ellison, 2020).

Preprint ESSAY | doi:10.20944/preprints202008.0245.v1

COVID-19 and the Epistemology of Epidemiological Models at the Dawn of AI

George Ellison

Subject: Computer Science And Mathematics, Computer Science Keywords: COVID-19; description; prediction; causal inference; extrapolation; simulation; projection

Online: 10 August 2020 (10:44:46 CEST)

Show abstract| Download PDF| Supplementary Files| Share

The models used to estimate disease transmission, susceptibility and severity determine what epidemiology can (and cannot tell) us about COVID-19. These include: ‘model organisms’ chosen for their phylogenetic/aetiological similarities; multivariable statistical models to estimate the strength/direction of (potentially causal) relationships between variables (through ‘causal inference’), and the (past/future) value of unmeasured variables (through ‘classification/prediction’); and a range of modelling techniques to predict beyond the available data (through ‘extrapolation’), compare different hypothetical scenarios (through ‘simulation’), and estimate key features of dynamic processes (through ‘projection’). Each of these models: address different questions using different techniques; involve assumptions that require careful assessment; and are vulnerable to generic and specific biases that can undermine the validity and interpretation of their findings. It is therefore necessary that the models used: can actually address the questions posed; and have been competently applied. In this regard, it is important to stress that extrapolation, simulation and projection cannot offer accurate predictions of future events when the underlying mechanisms (and the contexts involved) are poorly understood and subject to change. Given the importance of understanding such mechanisms/contexts, and the limited opportunity for experimentation during outbreaks of novel diseases, the use of multivariable statistical models to estimate the strength/direction of potentially causal relationships between two variables (and the biases incurred through their misapplication/misinterpretation) warrant particular attention. Such models must be carefully designed to address: ‘selection-collider bias’, ‘unadjusted confounding bias’ and ‘inferential mediator adjustment bias’ – all of which can introduce effects capable of enhancing, masking or reversing the estimated (true) causal relationship between the two variables examined. Selection-collider bias occurs when these two variables independently cause a third (the ‘collider’), and when this collider determines/reflects the basis for selection in the analysis. It is likely to affect all incompletely representative samples, although its effects will be most pronounced wherever selection is constrained (e.g. analyses focusing on infected/hospitalised individuals). Unadjusted confounding bias disrupts the estimated (true) causal relationship between two variables when: these share one (or more) common cause(s); and when the effects of these causes have not been adjusted for in the analyses (e.g. whenever confounders are unknown/unmeasured). Inferentially similar biases can occur when: one (or more) variable(s) (or ‘mediators’) fall on the causal path between the two variables examined (i.e. when such mediators are caused by one of the variables and are causes of the other); and when these mediators are adjusted for in the analysis. Such adjustment is commonplace when: mediators are mistaken for confounders; prediction models are mistakenly repurposed for causal inference; or mediator adjustment is used to estimate direct and indirect causal relationships (in a mistaken attempt at ‘mediation analysis’). These three biases are central to ongoing and unresolved epistemological tensions within epidemiology. All have substantive implications for our understanding of COVID-19, and the future application of artificial intelligence to ‘data-driven’ modelling of similar phenomena. Nonetheless, competently applied and carefully interpreted, multivariable statistical models may yet provide sufficient insight into mechanisms and contexts to permit more accurate projections of future disease outbreaks.

Preprint CONCEPT PAPER | doi:10.20944/preprints202401.1338.v1

‘Belief-Consistent Information Processing’ vs. ‘Coherence-Based Reasoning’: Pragmatic Frameworks for Exposing Common Cognitive Biases in Intelligence Analysis

Matthew Barrows, George Ellison

Subject: Social Sciences, Decision Sciences Keywords: Cognitive bias
Intelligence analysis
Intelligence assessment
Confirmation bias
Decision-making
Information processing

Online: 18 January 2024 (09:47:50 CET)

Show abstract| Download PDF| Share

Preprint CONCEPT PAPER | doi:10.20944/preprints202401.1311.v1

Might Wargaming be Another Instance Where “Anything You Can Do, AI Can Do Better”?

George Ellison, Andrew Shepherd

Subject: Social Sciences, Decision Sciences Keywords: Wargaming; Artificial Intelligence; AI; Decision-making

Online: 17 January 2024 (12:09:12 CET)

Show abstract| Download PDF| Share

Search Results

5 articles found