Article
Version 1
Preserved in Portico This version is not peer-reviewed
Correlations in Compositional Data without Log-Transformations
Version 1
: Received: 10 May 2023 / Approved: 10 May 2023 / Online: 10 May 2023 (08:57:50 CEST)
A peer-reviewed article of this Preprint also exists.
Monich, Y.V.; Nechipurenko, Y.D. Correlations in Compositional Data without Log Transformations. Axioms 2023, 12, 1084. Monich, Y.V.; Nechipurenko, Y.D. Correlations in Compositional Data without Log Transformations. Axioms 2023, 12, 1084.
Abstract
The article proposes a method for determining the p-value of correlations in compositional data, i.e., those data that arise as a result of dividing the original values by their sum. Data organized in this way are typical for many fields of knowledge, but there is still no consensus on methods for interpreting correlations in such data. In a space closed by normalizing quantity, correlation coefficients behave differently than under normal conditions: their probabilities of occurrence do not coincide with those inherent in the standard scale of estimates. In the tens of the new millennium, almost all newly emerging methods for estimating correlation in compositional data began to require mandatory log-transformation of the variable values. In the method proposed here there are no log-transformations. We return to the early stages of attempting to solve the problem and rely on negative shifts in correlations in the multinomial distribution. In modeling the data, we use a hybrid method that combines the hypergeometric distribution with the distribution of any other law. During our work on the calculation method, we found that the number of degrees of freedom in compositional data measures discretely only when all the normalizing sums are equal and that it decreases when the sums are not equal, becoming a continuously varying quantity. Estimation of the number of degrees of freedom and the strength of its influence on the magnitude of the shift in the distribution of correlation coefficients is the basis of the proposed method.
Keywords
compositional data; mathematical expectation shift; loss of degrees of freedom; hybrid model
Subject
Computer Science and Mathematics, Probability and Statistics
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment