Preprint
Essay

Using Directed Acyclic Graphs (Dags) to Represent the Data Generating Mechanisms of Disease and Healthcare Pathways: A Guide for Educators, Students, Practitioners and Researchers

Altmetrics

Downloads

308

Views

156

Comments

1

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

23 September 2022

Posted:

08 October 2022

You are already at the latest version

Alerts
Abstract
Directed acyclic graphs (DAGs) are nonparametric causal path diagrams that have substantial utility as principled representations of disease and healthcare pathways, and of the underlying ‘data generating mechanisms’ these pathways involve. As such, DAGs provide a valuable bridge between: the aetiological knowledge, operational insight and professional experience on which clinical training and practice depend; and the more abstract epistemological and analytical considerations required to extract robust statistical insight from health and healthcare data. DAGs are nonetheless vulnerable to imperfect biomedical paradigms, partial clinical knowledge and limited empirical data. DAGs drawn under such circumstances offer limited scope for statistical insight free from cognitive, analytical or inferential bias if: they misrepresent the data generating mechanisms involved; or ignore the important role that omitted variables (whether measured, unmeasured or unacknowledged) might play therein. To address these weaknesses and broaden the appeal and application of DAGs, this chapter provides ten simple steps that educators can use to improve the analytical competence and statistical confidence of the healthcare students, qualified practitioners and experienced researchers they support. These steps use temporal logic to draw DAGs so as to: reduce reliance on uncertain knowledge, incomplete information, flawed assumptions or guesswork; and avoid, mitigate or acknowledge the errors and biases that each of these incur. The chapter comprises an accessible, non-technical overview of the perspective and thoughtfulness required to generate temporally coherent DAGs as objective representations of the probabilistic causal paths involved in context-specific data generating mechanisms. It encourages a focus on those variables operating as potential sources of analytical or inferential bias when estimating the plausible, probabilistic causal relationship between two pre-specified variables; and specifically addresses the challenges posed by: omitted; time-variant; non-asynchronous; and temporally obscure variables. The chapter includes a worked example based on a published clinical study to demonstrate how each of the steps required to generate temporally-informed DAGs can be applied to: critically appraise the analytical decisions made during applied healthcare research; and inform the decisions required when designing, undertaking and analysing primary and secondary, prospective and retrospective research. The appendices include a summary of ten recommendations for improving the reporting and interrogability of DAGs and DAG-informed analyses.
Keywords: 
Subject: Computer Science and Mathematics  -   Probability and Statistics
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated