Version 1
: Received: 10 October 2024 / Approved: 11 October 2024 / Online: 11 October 2024 (17:09:54 CEST)
How to cite:
Rinchai, D.; Garand, M.; Toufiq, M.; Kabeer, B. S. A.; Marr, N.; Chaussabel, D. CD2K: An Accessible Framework for Leveraging Public Omics Data in Biomedical Research Training. Preprints2024, 2024100933. https://doi.org/10.20944/preprints202410.0933.v1
Rinchai, D.; Garand, M.; Toufiq, M.; Kabeer, B. S. A.; Marr, N.; Chaussabel, D. CD2K: An Accessible Framework for Leveraging Public Omics Data in Biomedical Research Training. Preprints 2024, 2024100933. https://doi.org/10.20944/preprints202410.0933.v1
Rinchai, D.; Garand, M.; Toufiq, M.; Kabeer, B. S. A.; Marr, N.; Chaussabel, D. CD2K: An Accessible Framework for Leveraging Public Omics Data in Biomedical Research Training. Preprints2024, 2024100933. https://doi.org/10.20944/preprints202410.0933.v1
APA Style
Rinchai, D., Garand, M., Toufiq, M., Kabeer, B. S. A., Marr, N., & Chaussabel, D. (2024). CD2K: An Accessible Framework for Leveraging Public Omics Data in Biomedical Research Training. Preprints. https://doi.org/10.20944/preprints202410.0933.v1
Chicago/Turabian Style
Rinchai, D., Nico Marr and Damien Chaussabel. 2024 "CD2K: An Accessible Framework for Leveraging Public Omics Data in Biomedical Research Training" Preprints. https://doi.org/10.20944/preprints202410.0933.v1
Abstract
The exponential growth of publicly available omics data presents both an opportunity and a challenge for biomedical researchers, particularly those in low- and middle-income countries (LMICs). The Collective Omics Data to Knowledge (CD2K) initiative aims to address this challenge by providing an accessible framework for biomedical research training. This review describes the development and implementation of the CD2K program, which comprises three core modules: COD1 (reductionist interpretation of collective omics data), COD2 (creation of curated dataset collections), and COD3 (re-analysis of omics data on a global scale). The CD2K approach emphasizes the reuse and reinterpretation of public data, integrating literature mining and emerging technologies like Large Language Models (LLMs). A key feature of the program is its focus on accessibility, designed to make the exploitation of large-scale datasets amenable to researchers without extensive data science skills. The curriculum aims to equip trainees with a range of skills, from basic data interpretation to more advanced bioinformatics analysis, with an emphasis on producing tangible outputs such as peer-reviewed publications, which directly address career development needs. The CD2K initiative has involved researchers from multiple institutions across several countries, resulting in several publications and publicly available dataset collections. While still in its early stages, the program shows promise in providing a structured framework for leveraging public omics data in biomedical research. This review also discusses the current limitations of the CD2K approach and ongoing efforts to expand its reach. By offering an accessible model for building research capacity, the CD2K initiative represents a step towards fostering data-driven discovery in global biomedical research, particularly in resource-limited settings.
Keywords
Omics data; Biomedical research training; Data reuse; Low- and middle-income countries (LMICs)
Subject
Biology and Life Sciences, Immunology and Microbiology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.