Integration of multi-view datasets which are comprised of heterogeneous sources or different representations is challenging to understand the subtle and complex relationship in data. Such data integration methods attempt to combine efficiently the complementary information of multiple data types to construct a comprehensive view of underlying data. Nonnegative matrix factorization (NMF), an approach that can be used for signal compression and noise reduction, has aroused widespread attention in the last two decades. The Kullback–Leibler divergence (or relative entropy) information distance can be used to measure the loss function of NMF. In this article, we propose a fast and robust framework (RSNMF) based on symmetric nonnegative matrix factorization (SNMF) and similarity network fusion (SNF) for clustering human microbiome data including functional, metabolic and phylogenetic profiles. Many existing methods typically utilize all the information provided by each view to create a consensus representation, which often suffers a lot from noise in data and cannot provide a precise representation of the latent data structures. In contrast, RSNMF combines the strength of SNMF and the advantage of SNF to form a robust clustering indicator matrix thus can reduce the noise influence. We conduct experiments on one synthetic and two real dataset (microbiome data, text data) and the results show that the proposed RSNMF has better performance over the baseline and the state-of-art methods, which demonstrates the potential application of RSNMF for microbiome data analysis.
Keywords:
Subject: Computer Science and Mathematics - Mathematics
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.