A peer-reviewed article of this preprint also exists.
Abstract
Data generation using high throughput technologies has led to the accumulation of diverse types of molecular data.These data have different types (discrete,real,string etc.) and occur in various formats and sizes. Datasets including gene expression, miRNA expression, protein-DNA binding data (ChIP-Seq/ChIP-ChIP), mutation data(copy number variation, single nucleotide polymorphisms), GO annotations, protein-protein interaction and disease-gene association data are some of the commonly used genomic datasets to study biological processes. Each of them provides a unique, complementary and partly independent view of the genome and hence embed essential information about their regulatory mechanisms. In order to understand the functions of genes, proteins and analyze mechanisms arising out of their interactions, information provided by each of these datasets individually may not be sufficient. Therefore integrating these multi-omic data and inferring regulatory interactions from the integrated dataset provides a system level biological insights in predicting gene functions and their phenotypic outcomes. To study genome functionality through interaction networks, different methods have been proposed for collective mining of information from an integrated dataset. We survey here data integration approaches using state-of-the-art techniques such as network integration, Bayesian networks, regularized regression (LASSO) and multiple kernel learning methods.
Keywords:
Subject:
Computer Science and Mathematics - Mathematical and Computational Biology
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Alerts
Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.