Preprint Review Version 1 This version is not peer-reviewed

CD2K: An Accessible Framework for Leveraging Public Omics Data in Biomedical Research Training

Version 1 : Received: 10 October 2024 / Approved: 11 October 2024 / Online: 11 October 2024 (17:09:54 CEST)

How to cite: Rinchai, D.; Garand, M.; Toufiq, M.; Kabeer, B. S. A.; Marr, N.; Chaussabel, D. CD2K: An Accessible Framework for Leveraging Public Omics Data in Biomedical Research Training. Preprints 2024, 2024100933. https://doi.org/10.20944/preprints202410.0933.v1 Rinchai, D.; Garand, M.; Toufiq, M.; Kabeer, B. S. A.; Marr, N.; Chaussabel, D. CD2K: An Accessible Framework for Leveraging Public Omics Data in Biomedical Research Training. Preprints 2024, 2024100933. https://doi.org/10.20944/preprints202410.0933.v1

Abstract

The exponential growth of publicly available omics data presents both an opportunity and a challenge for biomedical researchers, particularly those in low- and middle-income countries (LMICs). The Collective Omics Data to Knowledge (CD2K) initiative aims to address this challenge by providing an accessible framework for biomedical research training. This review describes the development and implementation of the CD2K program, which comprises three core modules: COD1 (reductionist interpretation of collective omics data), COD2 (creation of curated dataset collections), and COD3 (re-analysis of omics data on a global scale). The CD2K approach emphasizes the reuse and reinterpretation of public data, integrating literature mining and emerging technologies like Large Language Models (LLMs). A key feature of the program is its focus on accessibility, designed to make the exploitation of large-scale datasets amenable to researchers without extensive data science skills. The curriculum aims to equip trainees with a range of skills, from basic data interpretation to more advanced bioinformatics analysis, with an emphasis on producing tangible outputs such as peer-reviewed publications, which directly address career development needs. The CD2K initiative has involved researchers from multiple institutions across several countries, resulting in several publications and publicly available dataset collections. While still in its early stages, the program shows promise in providing a structured framework for leveraging public omics data in biomedical research. This review also discusses the current limitations of the CD2K approach and ongoing efforts to expand its reach. By offering an accessible model for building research capacity, the CD2K initiative represents a step towards fostering data-driven discovery in global biomedical research, particularly in resource-limited settings.

Keywords

Omics data; Biomedical research training; Data reuse; Low- and middle-income countries (LMICs)

Subject

Biology and Life Sciences, Immunology and Microbiology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.