Subject:
Computer Science And Mathematics,
Probability And Statistics
Keywords:
Directed Acyclic Graph; DAG; confounding; collider bias; epistemology; inferential statistics
Online: 8 October 2022 (02:59:34 CEST)
Directed acyclic graphs (DAGs) are nonparametric causal path diagrams that have substantial utility as principled representations of disease and healthcare pathways, and of the underlying ‘data generating mechanisms’ these pathways involve. As such, DAGs provide a valuable bridge between: the aetiological knowledge, operational insight and professional experience on which clinical training and practice depend; and the more abstract epistemological and analytical considerations required to extract robust statistical insight from health and healthcare data. DAGs are nonetheless vulnerable to imperfect biomedical paradigms, partial clinical knowledge and limited empirical data. DAGs drawn under such circumstances offer limited scope for statistical insight free from cognitive, analytical or inferential bias if: they misrepresent the data generating mechanisms involved; or ignore the important role that omitted variables (whether measured, unmeasured or unacknowledged) might play therein. To address these weaknesses and broaden the appeal and application of DAGs, this chapter provides ten simple steps that educators can use to improve the analytical competence and statistical confidence of the healthcare students, qualified practitioners and experienced researchers they support. These steps use temporal logic to draw DAGs so as to: reduce reliance on uncertain knowledge, incomplete information, flawed assumptions or guesswork; and avoid, mitigate or acknowledge the errors and biases that each of these incur. The chapter comprises an accessible, non-technical overview of the perspective and thoughtfulness required to generate temporally coherent DAGs as objective representations of the probabilistic causal paths involved in context-specific data generating mechanisms. It encourages a focus on those variables operating as potential sources of analytical or inferential bias when estimating the plausible, probabilistic causal relationship between two pre-specified variables; and specifically addresses the challenges posed by: omitted; time-variant; non-asynchronous; and temporally obscure variables. The chapter includes a worked example based on a published clinical study to demonstrate how each of the steps required to generate temporally-informed DAGs can be applied to: critically appraise the analytical decisions made during applied healthcare research; and inform the decisions required when designing, undertaking and analysing primary and secondary, prospective and retrospective research. The appendices include a summary of ten recommendations for improving the reporting and interrogability of DAGs and DAG-informed analyses.
Subject:
Medicine And Pharmacology,
Other
Keywords:
directed acyclic graph; DAG; causal inference; bias; inferential statistics; reproducibility
Online: 8 October 2022 (02:57:44 CEST)
The origins of directed acyclic graphs (DAGs) date back to the emergence of ‘graph theory’ in the early 1700s (Biggs et al. 1986). DAGs are conceptual or literal, diagrammatic representations of causal paths between variables which are constructed – as their name suggests – on the basis of two over-riding principles: first, that all causal paths are ‘directed’ (i.e. for each pair of variables, only one can represent the cause, while the other must be its consequence); and second, that no direct cyclical paths, or indirect cyclical pathways (comprising sequences of consecutive paths) are allowed, such that no consequence can be considered its own direct or indirect cause (hence ‘acyclic’; Law et al., 2012). As such DAGs reflect the knowledge, presumptions, assumptions and/or speculation of the analyst(s) concerned regarding the causal relationships between each of the variables included therein. Current convention dictates that variables are represented as nodes/vertices, and that any causal paths between variables are represented as directed arcs/edges/lines, often in the form of arrows (see Figure 1). Although each arc indicates the presence and direction of a known/presumed/assumed/speculative causal relationship between the two variables concerned, drawing an arc does not require the sign, magnitude, precision or shape of the relationship to be known or declared (Tennant et al., 2021). In this respect, DAGs provide a simple, uncomplicated, accessible and entirely nonparametric approach for postulating causal relationships amongst any variables of interest even when these are uncertain, unknown or entirely speculative (Ellison, 2020). Nonetheless, as a result of the parametric constraints imposed by the presence/absence of possible arcs within any given DAG, these also reflect and support a number of more sophisticated statistical applications which make it possible to use DAGs to inform the design of multivariable statistical models that reflect the causal structure(s) involved – albeit without the need to know or understand the mathematical technicalities on which these are based (Lewis and Kuerbis, 2016). These features make DAGs attractive cognitive, educational and analytical tools for strengthening the epistemological, theoretical and empirical basis of causal inference, and there has been a recent proliferation in the use of DAGs across a range of applied scientific disciplines (e.g. Knight and Winship, 2013), and an associated upsurge in analytical methods training (e.g. Elwert, 2011; Gilthorpe, 2017; Hernán 2018; Roy, 2021; Hünermund, 2021). This Chapter reflects on a decade of delivering medical statistics training to undergraduate medical students at the University of Leeds between 2012-2021 in which the third year research, evaluation and special studies module (‘RESS3’) has used DAGs to support the development of applied statistical skills relevant to the extended student-selected research and evaluation projects (ESREP) students undertake in their fourth and final years (Ellison, 2021; Ellison et al., 2014a,b). Based on successive iterations of the structure and content of the RESS3 module, together with notes made during formal and informal planning and review meetings with module leads, lecturers, tutors and students, we draw on the claims and criticisms made of DAGs in the epidemiological literature to identify a number of explicit strengths (and associated, often implicit. weaknesses) that are central to their use in prediction and causal inference modelling. While using DAGs requires (and benefits from) a clear understanding of their non-parametric nature and parametric implications, the weaknesses of DAGs seem likely to reflect both: the challenges inherent in the modelling of data generating processes when these are imperfectly understood; and troublesome cognitive and heuristic tendencies common to all analytical tools – in which the tool facilitates the task in hand by reducing the necessity (and benefits of) exploring uncertainties and identifying assumptions. These, more epistemological considerations appear particularly challenging for medical undergraduates to grasp (Ellison, 2021), but also appear poorly understood by many established analysts and clinical epidemiologists (Ellison, 2020).
Subject:
Computer Science And Mathematics,
Computer Science
Keywords:
COVID-19; description; prediction; causal inference; extrapolation; simulation; projection
Online: 10 August 2020 (10:44:46 CEST)
The models used to estimate disease transmission, susceptibility and severity determine what epidemiology can (and cannot tell) us about COVID-19. These include: ‘model organisms’ chosen for their phylogenetic/aetiological similarities; multivariable statistical models to estimate the strength/direction of (potentially causal) relationships between variables (through ‘causal inference’), and the (past/future) value of unmeasured variables (through ‘classification/prediction’); and a range of modelling techniques to predict beyond the available data (through ‘extrapolation’), compare different hypothetical scenarios (through ‘simulation’), and estimate key features of dynamic processes (through ‘projection’). Each of these models: address different questions using different techniques; involve assumptions that require careful assessment; and are vulnerable to generic and specific biases that can undermine the validity and interpretation of their findings. It is therefore necessary that the models used: can actually address the questions posed; and have been competently applied. In this regard, it is important to stress that extrapolation, simulation and projection cannot offer accurate predictions of future events when the underlying mechanisms (and the contexts involved) are poorly understood and subject to change. Given the importance of understanding such mechanisms/contexts, and the limited opportunity for experimentation during outbreaks of novel diseases, the use of multivariable statistical models to estimate the strength/direction of potentially causal relationships between two variables (and the biases incurred through their misapplication/misinterpretation) warrant particular attention. Such models must be carefully designed to address: ‘selection-collider bias’, ‘unadjusted confounding bias’ and ‘inferential mediator adjustment bias’ – all of which can introduce effects capable of enhancing, masking or reversing the estimated (true) causal relationship between the two variables examined. Selection-collider bias occurs when these two variables independently cause a third (the ‘collider’), and when this collider determines/reflects the basis for selection in the analysis. It is likely to affect all incompletely representative samples, although its effects will be most pronounced wherever selection is constrained (e.g. analyses focusing on infected/hospitalised individuals). Unadjusted confounding bias disrupts the estimated (true) causal relationship between two variables when: these share one (or more) common cause(s); and when the effects of these causes have not been adjusted for in the analyses (e.g. whenever confounders are unknown/unmeasured). Inferentially similar biases can occur when: one (or more) variable(s) (or ‘mediators’) fall on the causal path between the two variables examined (i.e. when such mediators are caused by one of the variables and are causes of the other); and when these mediators are adjusted for in the analysis. Such adjustment is commonplace when: mediators are mistaken for confounders; prediction models are mistakenly repurposed for causal inference; or mediator adjustment is used to estimate direct and indirect causal relationships (in a mistaken attempt at ‘mediation analysis’). These three biases are central to ongoing and unresolved epistemological tensions within epidemiology. All have substantive implications for our understanding of COVID-19, and the future application of artificial intelligence to ‘data-driven’ modelling of similar phenomena. Nonetheless, competently applied and carefully interpreted, multivariable statistical models may yet provide sufficient insight into mechanisms and contexts to permit more accurate projections of future disease outbreaks.
Subject:
Social Sciences,
Decision Sciences
Keywords:
Cognitive bias
Intelligence analysis
Intelligence assessment
Confirmation bias
Decision-making
Information processing
Intelligence analysis
Intelligence assessment
Confirmation bias
Decision-making
Information processing
Online: 18 January 2024 (09:47:50 CET)
Aileen Oeberst and Roland Imhoff’s remarkable paper (Perspectives on Psychological Science 2023; 18: 1464-87). should be compulsory reading for all intelligence analysts. It offers a tantalisingly parsimonious explanation comprising the combination of strongly held prior 'beliefs' coupled with subsequent 'belief-consistent information processing' (akin to confirmation bias) - an explanation that simplifies understanding of a vast array of cognitive biases, and thereby suggests that all of these might be attenuated through the application of accessible analytical and assessment practices. What this means for intelligence analysis will be determined by the explicit and implicit 'beliefs' (or, more generally, ‘value[s]’) attributed to any evidence available, and any insight generated, to reduce decision-makers’ uncertainty and unsubstantiated certainty, and thereby offer them future advantage. Analysts are well aware that different sources of evidence (whether empirical, theoretical or entirely speculative) can assign different types and amounts of ‘prior value’ to the information available for intelligence analysis and assessment. Acknowledging such value as a potential driver of subsequent confirmation bias should help analysts guard against any inherent tendency to preference evidence solely on the basis that this was initially considered most ‘valuable’. Instead they should subject all evidence and all insight – regardless of perceived ‘value’ – to a consistent battery of systematic, rigorous and robust evaluation.
Subject:
Social Sciences,
Decision Sciences
Keywords:
Wargaming; Artificial Intelligence; AI; Decision-making
Online: 17 January 2024 (12:09:12 CET)
This article offers a pragmatic ‘epistemology of wargaming’ which views wargames as immersive ‘thought experiments’ in which the human players involved use their experiential, empirical, and theoretical knowledge – together with whatever cognitive models they are able to deploy, or develop anew – to generate a conceptual, operational understanding of the adversarial scenario in which they are immersed; and exploit this understanding to craft tactical decisions designed to optimise the likelihood they will achieve their strategic objectives. From this perspective, contemporary interest in the use of ‘AI’-enabled tools to augment the validity of wargaming outputs – where these outputs constitute the decisions players make and the insights such decisions reveal – might most purposefully focus on: the design and implementation of wargames (to strengthen the architecture these provide to support immersive decision-making); and the analysis of players’ decisions (to better understand the cognitive models these involve and reflect). While the focus we suggest might disappoint those keen to replace human players with (semi-) autonomous decision-making machines, as long as the principal objectives of wargaming are to assess and enhance the decision-making capabilities of human players and human personnel, ‘AI’-enabled applications can only ever play a supporting role (albeit a potentially invaluable one) in the design, presentation, implementation, and analysis of wargames. As Irving Berlin might have it: ‘AI’ might soon be able to do most things better than us, but it can never replace humans when only a human will do.