Preprint
Concept Paper

A Geometric Interpretation of the Multivariate Gaussian Distribution and its Entropy and Mutual Information

Altmetrics

Downloads

135

Views

32

Comments

0

This version is not peer-reviewed

Submitted:

25 May 2023

Posted:

26 May 2023

You are already at the latest version

Alerts
Abstract
The fundamental objective is to study the application of multivariate sets of data in Gaussian distribution. This paper examines broad measurements of structure for both Gaussian and non-Gaussian distributions, which shows that they can be described in terms of the infor-mation-theoretic between the given covariance matrix and correlated random variables (in terms of relative entropy). In order to develop the multivariate Gaussian distribution with entropy and mutual information, several significant methodologies are presented through the discussion supported by illustrations, both technically and statistically. The content obtained allows readers to better perceive concepts, comprehend techniques, and properly execute software programs for future study on the topic's science and implementations. It also helps readers grasp the themes' fundamental concepts. Involving the relative entropy and mutual information as well as the potential correlated covariance analysis based on differential equations, a wide range of information is addressed, including basic to application concerns.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning

1. Introduction

Understanding the ways knowledge concerning an external variable, or the reciprocal information of its parts, is distributed across the parts of a multivariate system can assist characterize and infer the underlying mechanics and function of the system. This goal has driven the development of several techniques for dissecting the elements of a set of variables’ combined entropy or for dissecting the contributions of a set of variables to the mutual information about the variable of interest. In actuality, this association and its modifications exist for any input signal and the widest range of Gaussian pathways, comprising discrete-time and continuous-time pathways in scalar or vector forms.
In a more general way, mutual information and mean-square error are the fundamental concepts of information theory and estimating theory, respectively. In contrast to the MMSE, which determines how precisely each input sample can be restored using the channel’s outcomes, the input-output mutual information is an estimation of whether the information can be consistently delivered over a channel given a specific input signal. An inactive functioning characterization for mutual information is provided by the substantial relevance of mutual information to estimate and filtering. Therefore, the significance of identity is not only obvious, but the link is also fascinating and merits an in-depth explanation [1,2,3]. Relations between the MMSE of the approximation of the output given the input and the localized actions of the mutual information at diminishing SNR are presented in [4]. [6] gives the idea about the probabilistic ratios of geometric characteristics of signal detection in Gaussian noise. Furthermore, whether in a continuous-time [5,6,7] or discrete-time setting [8] context, the likelihood ratio is difficult in the relationship between observation and estimation [9].
Considering the specific instance of parametric computation (or Gaussian inputs), correlations relating to causal and non-causal estimation errors have been investigated in [10,11], involving the limit on the loss owing to the causality restriction is specified. Knowing how data pertaining of an external parameter, or inversely related data within its parts, distributes across the parts of a multivariate system can assist categorize and determining the fundamental mechanics and functionality of the structure. This goal served as the impetus for the development of various techniques for decomposing the various elements of a set of parameters’ joint entropy [12,13] or to deconvolute the additions of a set of elements to the mutual information about a target variable [14]. These techniques can be used to examine a variety of intricate systems, including those in the physical distinctions domain, such as gene networks [15] or brain coding [16], as well as those in the social domain, such as selection agents [17] and community behavior [18]. They can also be used to analyze artificial agents [19]. Additionally, some new proposals diverge more significantly from the original framework, either through the adoption of novel principles, the consideration of the presence of detrimental elements linked to erroneous, or the implementation of joint entropy subdivisions in place of mutual information [20,21].
In the multivariate scenario, the challenges of breaking down mutual information into redundancy and complimentary sections have nevertheless been significantly increased. The novel redundancy determines that were initially developed are only defined for the bivariate situation [24,25], or allow negative components [26], whereas measurements of coordination are more readily extended to the multivariate case, especially when using the maximum entropy architecture [22,23]. By either utilizing the associations between lattices formed by various numbers of parameters or utilizing the multiple interactions between redundant lattices and information loss lattices, for which collaborative efforts are more actually defined, the study in [27] established two analogous techniques for constructing multivariate redundant metrics. The maximum entropy framework allows for a more straightforward generalization of the efficiency measurements to the multivariate case [24,25].
In the present study, we propose an extension of the bivariate Gaussian distribution technique to calculate multivariate redundant metrics inside the maximum entropy context. The importance of the maximum entropy approach in the bivariate scenario, where it offers constraints for the actual redundancy, unique information, and efficiency terms under logical presumptions shared by additional criteria, acts as the motivation for this particular focus [24]. The maximum entropy measurements, specifically, offer a lower limit for the actual cooperation and redundant terms and a higher limit for the actual specific information if it is presumed that a bivariate non-negative disintegration exists and that redundancy can be calculated from the bivariate distributions of the desired outcome with every source. Furthermore, if these bivariate distributions are consistent with possibly having little interaction under the previous hypotheses, then the maximum entropy decomposition returns not only boundaries but also the precise actual terms. Here, we demonstrate that, under similar presumptions, the maximum entropy reduction also plays this dominant role in the multivariate situation.
The remainder of this paper is organized as follows. A brief review of the geometry of the Gaussian distribution is reviewed in Section 2. The consecutive three sections deal with various important topics on information entropy with illustrative examples with emphasis on visualization of the information and discussion. In Section 3, continuous entropy/differential entropy is presented. In Section 4, the relative entropy (Kullback-Leibler divergence) is presented. Mutual information is presented in Section 5. Conclusions are given in Section 6.

2. Geometry of the Gaussian Distribution

In this section, the background relations on Gaussian distribution for different parametric point of view has been discussed. The exploratory analysis’s fundamental objective is to identify “the framework” in multivariate datasets. Ordinary least-squares regression and principal component analysis (PCA), respectively, offer typical measurements for dependency (the predicted connection between particular components) and rigidity (the degree of prominence of the probability density function (pdf) around a low-dimensional axis) for bivariate Gaussian distributions. Mutual information, an established measure of dependency, is not an accurate indicator of rigidity since it is not invariant with an opposite rotation of the parameters. For bivariate Gaussian distributions, a suitable rotating invariant compactness measure is constructed and demonstrated to reduce the corresponding PCA measure.
The Gaussian pdf (a) does not have a framework in either of the above-described definitions and represents the independent variables without any settling around a lower-dimensional region. The Gaussian pdf (b), on the other hand, has greater variance along one axis over another. Despite being independent, their combined pdf is small. Although the variables are associated and therefore likewise characterized by dependency, the Gaussian pdf (c) is equally focused around one dimension as is (b).

2.1. Standard Parametric Represenatation of an Ellipse

If the data is uncorrelated and therefore has zero covariance, the ellipse is not rotated and axis aligned. The radii of the ellipse in both directions are then the variances. Geometrically, a not rotated ellipse at point (0,0) and radii a and b for the x 1 - and x 2 -direction is described by
x a 2 + y b 2 = 1
Figure 1 represents the construction of single points of an ellipse is due to de La Hire. It is based on the standard parametric representation.
The general probability density function for the multivariate Gaussian is given by
f X ( x | μ , ) = 1 ( 2 π ) n | | 1 / 2 e 1 2 ( X μ ) T 1 ( X μ )
where μ = E [ X ] , = Cov ( X ) = E [ ( X μ ) ( X μ ) T ] is symmetric, positive semi-definite matrix. If is the identity matrix, then the Mahalanobis distance reduces to the standard Euclidean distance between X and μ .
For bivariate Gaussian distributions with zero mean, the pdf can be expressed as
f X ( x | μ , ) = 1 2 π | Σ | e 1 2 ( X μ ) T 1 ( X μ )
and mean and covariance matrix are given by
μ = μ 1 μ 2 ;   = σ 1 2 σ 12 σ 12 σ 2 2 = σ 1 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 2 2
respectively, where the linear correlation coefficient | ρ | 1 .
Variance measures the variation of a single random variable, whereas covariance is a measure of how much two random variables vary together. With the covariance we can calculate entries of the covariance matrix, which is a square matrix. In addition, the covariance matrix is symmetric. The diagonal entries of the covariance matrix are the variances, however the other entries are the covariances. Due to this cause, the covariance matrix is often called as the variance-covariance matrix.

2.2. The Confidence Ellipse

A typical way to visualize two-dimensional Gaussian distributed data is plotting a confidence ellipse. The distance d M = ( X μ ) T - 1 ( X μ ) is a constant value referred to as the Mahalanobis distance, which is a random variable distributed by the chi-squared distribution, denoted as χ k 2
P [ ( X μ ) T - 1 ( X μ ) χ k 2 ( α ) ] = 1 - α
where k is the number of degree of freedom and α is the given probability related to the confidence ellipse. For example, if α = 0.95 , 95% confidence ellipse is defined. Extension from Equation (1), the radius in each direction is the standard deviation σ 1 and σ 2 parametrized by a scale factor s, known as the Mahalanobis radius of the ellipsoid:
x 1 σ 1 2 + x 2 σ 2 2 = s
The goal must be to determine the scale s such that confidence p is met. Since the data is multivariate Gaussian distributed, the left hand side of the equation is the sum of squares of Gaussian distributed samples, which follows a χ2 distribution. A χ2 distribution is defined by the degrees of freedom and since we have two dimensions, the number of degrees of freedom is also two. We now want to know the probability that the sum and therefore s has a certain value under a χ2 distribution.
This ellipse, also a probability contour, defines the region of a minimum area (or volume in multivariate case) containing a given probability under the Gaussian assumption. This equation can be solved using a χ2 table or simply using the relation s = 2 ln ( 1 p ) . The confidence interval can be evaluated through p = 1 e x p 0.5 s . For s = 1 , we have p = 1 e x p 0.5 0.3935 . Furthermore, typical values include s = 2.279 , s = 4.605 , s = 5.991 and s = 9.210 for p = 0.68 , p = 0.9 , p = 0.95 and p = 0.99 , respectively. The ellipse can then be drawn with radii σ 1 s and σ 2 s .
Figure 2. Relation of the confidence interval and the scale factor s.
Figure 2. Relation of the confidence interval and the scale factor s.
Preprints 74672 g002
The Mahalanobis distance accounts for the variance of each variable and the covariance between variables.
( X μ ) T 1 ( X μ ) = x 1 μ 1 x 2 μ 2 σ 1 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 2 2 1 x 1 μ 1 x 2 μ 2 = x 1 μ 1 x 2 μ 2 σ 2 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 1 2 σ 1 2 σ 2 2 ( 1 ρ 2 ) x 1 μ 1 x 2 μ 2 = 1 1 ρ 2 ( x 1 μ 1 ) 2 σ 1 2 2 ρ ( x 1 μ 1 ) ( x 2 μ 2 ) σ 1 σ 2 + ( x 2 μ 2 ) 2 σ 2 2
Geometrically, it does this by transforming the data into standardized uncorrelated data and computing the ordinary Euclidean distance for the transformed data. In this way, the Mahalanobis distance is like a univariate z-score: it provides a way to measure distances that takes into account the scale of the data.
In the general case, covariances σ 12 and σ 21 are not zero and therefore the ellipse-coordinate system is not axis-aligned. In such case, instead of using the variance as a spread indicator, we use the eigenvalues of the covariance matrix. The eigenvalues represent the spread in the direction of the eigenvectors, which are the variances under a rotated coordinate system. By definition a covariance matrix is positive definite therefore all eigenvalues are positive and can be seen as a linear transformation to the data. The actual radii of the ellipse are λ 1 and λ 2 for the two eigenvalues λ 1 and λ 2 of the scaled covariance matrix s .
Based on Equations (3) and (7), the bivariate Gaussian distributions can be represented as
f ( x 1 , x 2 ) = 1 2 π σ 1 σ 2 1 ρ 2 e 1 2 1 1 ρ 2 ( x 1 μ 1 ) 2 σ 1 2 2 ρ ( x 1 μ 1 ) ( x 2 μ 2 ) σ 1 σ 2 + ( x 2 μ 2 ) 2 σ 2 2
Level surface of f ( x 1 , x 2 ) are concentric ellipses
( x 1 μ 1 ) 2 σ 1 2 2 ρ ( x 1 μ 1 ) ( x 2 μ 2 ) σ 1 σ 2 + ( x 2 μ 2 ) 2 σ 2 2 = c
where c is the Mahalanobis distance possessing the following properties:
  • It accounts for the fact that the variances in each direction are different.
  • It accounts for the covariance between variables.
  • It reduces to the familiar Euclidean distance for uncorrelated variables with unit variance.
The length of the ellipse axes are a function of the given probability of the chi-squared distribution with 2 degrees of freedom χ 2 2 ( α ) , the eigenvalues λ = λ 1 λ 2 T and the linear correlation coefficient ρ . If α = 0.95 , 95% confidence ellipse is defined by
x 1 μ 1 x 2 μ 2 1 x 1 μ 1 x 2 μ 2 χ 2 2 ( 0.05 )
where
- 1 = 1 σ 1 2 σ 2 2 ( 1 ρ 2 ) σ 2 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 1 2
As denotes a symmetric matrix, the eigenvectors of is linearly independent (or orthogonal).

2.3. Similarity Transform

The simplest similarity transformation method for eigenvalue computation is the Jacobi method which deals with the standard eigenproblems. In the multivariate Gaussian distribution, the covariance matrix can be expressed in terms of eigenvectors
= U Λ U 1 = U Λ U T = u 1 u 2 λ 1 0 0 λ 2 u 1 T u 2 T
where U = u 1 u 2 are the eigenvectors of and Λ is the diagonal matrix of the eigenvalues λ = λ 1 λ 2 T
Λ = λ 1 0 0 λ 2
Replacing by 1 = U Λ 1 U 1 , the square of the difference can be written as:
x 1 μ 1 x 2 μ 2 U Λ 1 U 1 x 1 μ 1 x 2 μ 2 χ 2 2 ( 0.05 )
as U T = U 1 . Denoting
y 1 y 2 = U 1 x 1 μ 1 x 2 μ 2
the square of the difference can then be expressed as:
y 1 y 2 λ 1 0 0 λ 2 y 1 y 2 χ 2 2 ( 0.05 )
If the above equation is further evaluated, the resulting equation is the equation of an ellipse aligned with the axis y 1 and y 2 in the new coordinate system.
y 1 2 χ 2 2 ( 0 . 05 ) λ 1 + y 2 2 χ 2 2 ( 0 . 05 ) λ 2 1
The axes of the ellipse are defined by y 1 axis with a length 2 λ 1 χ 2 2 ( 0 . 05 ) and y 2 axis with a length 2 λ 2 χ 2 2 ( 0 . 05 ) .
When ρ = 0 , the eigenvectors are equal to λ 1 = σ 1 and λ 2 = σ 2 . Also, U matrix whose elements are the eigenvectors of becomes an identity matrix. The final equation of an ellipse is then defined by
( x 1 μ 1 ) 2 χ 2 2 ( 0 . 05 ) λ 1 + ( x 2 μ 2 ) 2 χ 2 2 ( 0 . 05 ) λ 2 1
It is clear from the equation given above that the axes of the ellipse are parallel to the coordinate axes. The lengths of the axes of the ellipse are then defined as 2 σ 11 χ 2 2 ( 0 . 05 ) and 2 σ 22 χ 2 2 ( 0 . 05 ) .
The covariance matrix can be presented by its eigenvectors and eigenvalues: U = U Λ , where U is the matrix whose columns are the eigenvectors of and Λ is the diagonal matrix with diagonal elements given by the eigenvalues of . Transformation is performated based on the three steps involving scaling, rotation, and translation.
  • Scaling
The coariance matrix can be written as = U Λ U 1 = U S S U 1 , where S is a diagonal scaling matrix S = Λ 1 / 2 = S T .
2.
Rotation
U is generalized from the normalized eigenvectors of the covariance matrix .
U = cos ( θ )   - sin ( θ ) sin ( θ ) cos ( θ )
Note that U is an orthogonal matrix U 1 = U T , and | U | = 1 . Define the matrix with rorartion and scaling T = U S , T T = ( U S ) T = S T U T = S U 1 . The covariance matrix can thus be written as = T T T and U T U = Λ being diagonal with eigenvalues λ i . Since T = U S , we have Y = T X = U S X = U Λ 1 / 2 X .
x 1 ( t ) x 2 ( t ) = u 1 x u 2 x u 1 y u 2 y λ 1 cos ( t ) λ 2 sin ( t ) = cos ( θ )   - sin ( θ ) sin ( θ ) cos ( θ ) λ 1 cos ( t ) λ 2 sin ( t )
The similarity transform is applied to obtain the relation X T 1 X = Y T U T 1 U Y = Y T Λ 1 Y , and the pdf of Y vector can be found to be
f Y ( y ) = i = 1 n 1 2 π λ i e 1 2 y i 2 λ i
The ellipse in the transformed frame can be represented as
y 1 2 λ 1 + y 2 2 λ 2 = c
where the eigenvectors are equal to λ 1 = σ 1 2 and λ 2 = σ 2 2 .
3.
Translation
x 1 ( t ) = λ 1 cos ( θ ) cos ( t ) λ 2 sin ( θ ) sin ( t ) + μ 1
x 2 ( t ) = λ 1 sin ( θ ) cos ( t ) + λ 2 cos ( θ ) sin ( t ) + μ 2
The eigenvalues λ = λ 1 λ 2 T can be calculated through
λ 1 = 1 2 σ 1 2 + σ 2 2 + ( σ 1 2 σ 2 2 ) 2 + 4 ρ 2 σ 1 2 σ 2 2 ;
λ 2 = 1 2 σ 1 2 + σ 2 2 ( σ 1 2 σ 2 2 ) 2 + 4 ρ 2 σ 1 2 σ 2 2
and thus
| | = λ 1 λ 2 = σ 1 2 σ 2 2 ( 1 ρ 2 )
From other view point for calculation of covariance matrix
= U Λ U T = cos ( θ )   - sin ( θ ) sin ( θ ) cos ( θ ) λ 1 0 0 λ 2 cos ( θ )   sin ( θ ) - sin ( θ ) cos ( θ ) = λ 1 cos ( θ )   - λ 2 sin ( θ ) λ 1 sin ( θ ) λ 2 cos ( θ ) cos ( θ )   sin ( θ ) - sin ( θ ) cos ( θ ) = λ 1 cos 2 ( θ ) + λ 2 sin 2 ( θ )   ( λ 1 - λ 2 ) ( sin ( θ ) - cos ( θ ) ) s y m s λ 1 sin 2 ( θ ) + λ 2 cos 2 ( θ )
Claculation for the determinant of covariance matrix above gives the same result and the inverse is
1 = 1 λ 1 λ 2 λ 1 sin 2 ( θ ) + λ 2 cos 2 ( θ )   ( λ 2 - λ 1 ) ( sin ( θ ) - cos ( θ ) ) s y m s λ 1 cos 2 ( θ ) + λ 2 sin 2 ( θ ) = sin 2 ( θ ) λ 2 + cos 2 ( θ ) λ 1 sin ( θ ) cos ( θ ) 1 λ 1 1 λ 2 s y m s sin 2 ( θ ) λ 1 + sin 2 ( θ ) λ 2

2.4. Simulation with a Given Variance-covariance Matrix

Given the data X ~ N ( μ , ) , an ellipse representing the confidence p can be plotted by calculating the radii of the ellipse, its center and rotation. Specify θ (by which U can be obtained) and S for generating the covariance matrix , thus ρ can be derived. The inclination angle is calculated through:
θ = 0 i f σ 12 = 0 a n d σ 1 2 σ 2 2 π / 2 i f σ 12 = 0 a n d σ 1 2 < σ 2 2 tan 1 ( λ 1 σ 1 2 , σ 12 ) e l s e
which can be used in calculation of U
U = cos ( θ )   - sin ( θ ) sin ( θ ) cos ( θ )
and the covariance can be evaluated by: = U Λ U T = U S S U T if S is specified. On the other way, given the correlation coefficient ρ and variances for generating the covariance matrix , thus θ can be obtained.
To generate the sampling points that meet the specified correlation, the following procedure can be followed. Given two random variables X 1 and X 2 , their linear combination Y = α X 1 + β X 2 . As for the generation of correlated random variables, if we have two Gaussian, uncorrelated random variables X 1 , X 2 then we can create 2 correlated random variables using the formula
Y = ρ X 1 + 1 ρ 2 X 2
and then Y will have a correlation ρ with X 1 :
ρ = σ 12 / ( σ 1 σ 2 )
Based on the relation: X = A Z + μ , Z ~ N ( 0 , 1 ) , the following equation can be employed to generate the sampling points for the scatter plots using the Matlab software:
X = A * r a n d n ( 2 , K ) + μ * o n e s ( 1 , K )
where the Cholesky decomposition of has a lower triangular matrix for A , = A A T and μ is the vectors of mean values .
When ρ = 0 , the axes of the ellipse are parallel to the original coordinate system and when ρ 0 , axes of the ellipse are aligned with the rotated axes in the transformed coordinate system. Figure 3 and Figure 4 display ellipses drawn for various levels of confidences. The plots provide illustration of confidence (error) ellipses with different confidence levels (i.e., 68%, s = 2.279 ; 90%, s = 4.605 ; 95%, s = 5.991 ; 99%, s = 9.210 from inner to outer ellipses), considering the cases where the random variables are (1) positively correlated ρ > 0 , (2) negatively correlated ρ < 0 , and (3) independent ρ = 0 . More specifically, in Figure 3, the position of ellipse with various correlation coefficient given by the angel of inclination, specify θ to obtain ρ , ρ = σ 12 / ( σ 1 σ 2 ) : (a) θ = 30 , ρ 0.55 ; (b) θ = 0 , ρ = 0 ; (c) θ = 150 , ρ 0.55 , respectively. On the other hand, in Figure 4, the position of ellipse with various values of correlation constant given the angel of inclination, specify ρ to obtain θ : (a) ρ = 0.95 , θ = 45 ; (b) ρ = 0 , θ = 0 ; (c) ρ = 0.95 , θ = 135 , respectively. Rotation angle is measrued 0 θ 180 with respect to the positive axis. When ρ > 0 , the angle is in the first quadrant and ρ < 0 , the angle is in the second quadrant.
In the following, two scenarios cases invloving more illustrations will be visited.
(1) Equal variances for two random variables with nonzero ρ :
Case 1: Fixed correlation coefficient. As a example, when ρ = 0.5 , and the variances σ 1 = σ 2 = σ are ranging from 2 ~ 5 , as shown in Figure 5. As can be seen, the contours and the scatter plots are ellipses instead of circles.
= σ 1 2 σ 12 σ 21 σ 2 2 = σ 1 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 2 2 = 4 2 0.5 ( 4 ) ( 2 ) 0.5 ( 4 ) ( 2 ) 2 2 = 4 2 4 4 2 2
Subplot (a) in Figure 6 shows the ellipses for ρ = 0.5 with varying variances. In the present and subsequent illustartions, 95% confidence levels are shown.
Case 2: Increasing correlation coefficient ρ from zero correlation. With fixed variance σ 1 = σ 2 = σ , the contour will initially be a circle when ρ = 0 and then an ellipse as ρ increases when ρ 0 . Sbuplot (b) in Figure 6 provides the contours with scatter plots for ρ = 0 , 0.5 , 0.9 , 0.99 , respectively when σ 1 = σ 2 = 2 . The eccestricity of the ellipses increases with the increase of ρ .
(2) Unequal variances for two random variables, σ 1 σ 2 with fixed correlation coefficient. ρ = 0.5 Case 1: σ 1 > σ 2 . The variation of three dimensional surfaces and ellipses are presented in Figure 7 and Figure 8a with the increase of σ 1 / σ 2 , where σ 1 = 2 ~ 5 and σ 2 = 2 .
Case 2: σ 2 > σ 1 . The variation of the ellipses are presented in Figure 8b with the increase of σ 2 / σ 1 , where σ 2 = 2 ~ 5 and σ 1 = 2 . Figure 9 shows the variation of inclination angle as a function of σ 1 and σ 2 , for ρ = 0 and ρ = 0.5 for providing further insights on the variation of inclination angle θ with respect to σ 1 and σ 2 .
(3) Variation of the ellipses for the various positive and negaive correlation. For a given variance, when ρ is specified, thus the eigenvalues and the inclination angle are obtained accordingly. Figure 10 presents results for the cases of σ 1 > σ 2 ( σ 1 = 4 , σ 2 = 2 in this example) and σ 2 > σ 1 ( σ 1 = 2 , σ 2 = 4 in this example) with various correlation coefficients (namely, positive, zero, and negative) including ρ = 0 , 0.5 , 0.9 , 0.99 and ρ = 0 , 0.5 , 0.9 , 0.99 . In the figure, σ 1 = 4 , σ 2 = 2 are applied for the top plots; while σ 1 = 2 , σ 2 = 4 are applied for the bottom plots. On the other hand, ρ = 0 , 0.5 , 0.9 , 0.99 are applied for the left plots; while ρ = 0 , 0.5 , 0.9 , 0.99 are applied for the left plots. Furthermore, Figure 11 provides comparsion of the ellipses for various σ 1 and σ 2 for the following cases: (i) σ 1 = 2 , σ 2 = 4 ; (ii) σ 1 = 4 , σ 2 = 2 ; (iii) σ 1 = σ 2 = 2 ; (iv) σ 1 = σ 2 = 4 , while fived ρ = 0.5 .

3. Continuous Entropy/Differential Entropy

Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula and rather just assumed it was the correct continuous analog of discrete entropy, but it is not.[1]: [181–218]. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy (described here) is commonly encountered in the literature, but it is a limiting case of the LDDP and one that loses its fundamental association with discrete entropy.
In the following discussion, differential entropy, and relative entropy are measured in bits, which is used in the definition. Instead, if ln is used, it is then measured in nats, and the only difference in the expression is the log 2 e factor.

3.1. Entropy of a Univariate Gaussian Distribution

If we have a continuous random variable X with a probability density function (pdf) f X ( x ) , the differential entropy of X in bits is expressed as
h ( X ) = E [ log 2 f X ( x ) ] = f X ( x ) log 2 f X ( x ) d x
Let X be a Gaussian random varialbe X ~ N ( μ , σ 2 )
f X ( x ) = 1 2 π σ e 1 2 x μ σ 2
The differential entropy for this univariate Guassian distribution can be evaluated
h ( X ) = E [ log 2 f X ( x ) ] = f X ( x ) log 2 f X ( x ) d x = f X ( x ) log 2 1 2 π σ e ( x μ ) 2 2 σ 2 d x = 1 2 log 2 ( 2 π e σ 2 )
Figure 12 shows the differential entropy as a function σ 2 for the univariate Gaussian variable, which is concave downward and grows first very fast and then much slower at high values of σ 2 .

3.2. Entropy of a Multivariate Gaussian Distribution

Let X follows a multivariate Gaussian distribution X ~ N ( μ , ) , as given by Equation (2), then the differential entropy of X in nats is
h ( X ) = E [ log 2 f X ( x ) ] = f X ( x ) log 2 f X ( x ) d x
and the differential entropy is given by (Appendix B)
h ( X ) = 1 2 log 2 ( ( 2 π e ) n | | )
The above calculation involves the evaluation of expectation of the Mahalanobis distance as (Appendix C)
E [ ( x μ ) T 1 ( x μ ) ] = n
For a fixed variance, the normal distribution is the pdf that maximizes entropy. Let X = X 1 X 2 be a 2D Gaussian vector, the entropy of X can be calculated to be
h ( X ) = h ( X 1 , X 2 ) = 1 2 log 2 ( 2 π e ) 2 | | = log 2 ( 2 π e σ 1 σ 2 1 ρ 2 )
with covaraince matrix
= σ 1 2 σ 12 σ 12 σ 2 2 = σ 1 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 2 2
If σ 1 = σ 2 = σ , this becomes
h ( X 1 , X 2 ) = log 2 ( 2 π e σ 2 1 ρ 2 )
which is a function of ρ 2 concave downward, and grows first very fast and then much slower for high ρ 2 values, shown as in Figure 13.

3.3. The Differential Entropy in the Transformed Frame

The differential entropy is invariant to a translation (change in the mean of the pdf)
h ( X + a ) = h ( X )
and
h ( b X ) = h ( X ) + log 2 | b |
For a random variable vector, the differential entropy in the transformed frame remains the same as the one in the original frame. It can be shown in general that
h ( Y ) = h ( U X ) = h ( X ) + log 2 | U | = h ( X )
For the case of multivariate Gaussian distribution, we have
h ( X ) = 1 2 log 2 ( 2 π e ) n | | = n 2 log 2 ( 2 π e ) + 1 2 log 2 | | = n 2 log 2 ( 2 π e ) + i = 1 n 1 2 log 2 λ i
It is known that the determinant of the covariance matrix is equal to the product of its eigenvalues:
| | = i = 1 n λ i
For the case of bivariate Gaussian distribution, n = 2 , we have
f Y ( y ) = i = 1 2 1 2 π λ i e 1 2 y i 2 λ i = 1 2 π λ 1 e 1 2 y 1 2 λ 1 1 2 π λ 2 e 1 2 y 2 2 λ 2 = 1 2 π λ 1 λ 2 e 1 2 y 1 2 λ 1 + y 2 2 λ 2
It can be shown that the entropy in the transformed frame is given by
h ( Y ) = 2 2 log 2 ( 2 π e ) + i = 1 2 log 2 ( λ i ) = log 2 ( 2 π e ) + log 2 ( λ 1 λ 2 )
Detailed derivation are provided in Appendix D. As discussed, the determinant of the covariance matrix is equal to the product of its eigenvalues
| | = λ 1 λ 2 = 1 2 σ 1 2 + σ 2 2 + ( σ 1 2 σ 2 2 ) 2 + 4 σ 1 2 σ 2 2 ρ 2 1 2 σ 1 2 + σ 2 2 ( σ 1 2 σ 2 2 ) 2 + 4 σ 1 2 σ 2 2 ρ 2 = σ 1 2 σ 2 2 ( 1 ρ 2 )
and thus the entropy can be presented as
h ( Y 1 , Y 2 ) = 1 2 log 2 ( 2 π e ) 2 | | = 1 2 log 2 ( 2 π e ) 2 σ 1 2 σ 2 2 ( 1 ρ 2 ) ) = log 2 ( 2 π e σ 1 σ 2 1 ρ 2 )
The result confirms the statement that the differential entropy remains unchanged in the transformed frame.

4. Relative Entropy (Kullback-Leibler Divergence)

In this section, various important issues regarding the relative entropy (Kullback-Leibler divergence will be delivered. Despite the aforementioned flaws, there is a possibility of information theory in the continuous case. A key result is that definitions for relative entropy and mutual information follow naturally from the discrete case and retain their usefulness.
The relative entropy is a type of statistical distance that provides a measure of how one probability distribution f X is different from a second, reference probability distribution g X , denoted as
D K L ( f | | g ) = f X ( x ) log 2 f X ( x ) g X ( x ) d x
Detailed derivation is provided in Appendix E. The relative entropy between two Gaussian distributions with different means and variances are given by
D K L ( f | | g ) = 1 2 ln σ 2 2 σ 1 2 + σ 1 2 σ 2 2 + μ 1 μ 2 σ 2 2 1 log 2 e
Notice that the relative entropy here is measured in bits where log 2 is used in the definition. In stead, if ln is used, it would be measured in nats. The only difference in the expression is the log 2 e factor. Several conditions are discussed.
(1) If σ 1 = σ 2 = σ , D K L ( f | | g ) = 1 2 μ 1 μ 2 σ 2 log 2 e , which is 0 when μ 1 = μ 2 .
Figure 14. Relative entropyas a function of σ and μ 1 μ 2 when σ 1 = σ 2 = σ : (a) three dimensional surface; (b) contour with entropy gradient.
Figure 14. Relative entropyas a function of σ and μ 1 μ 2 when σ 1 = σ 2 = σ : (a) three dimensional surface; (b) contour with entropy gradient.
Preprints 74672 g014
(2) If σ 1 = σ 2 = 1 , D K L ( f | | g ) = 1 2 ( μ 1 μ 2 ) 2 log 2 e , which is a even function with a minimum value of 0 when μ 1 = μ 2 .
Figure 15. Variations of relative entropy when σ 1 = σ 2 = 1 : (a) three dimensional surface as a function of μ 1 and μ 2 (b) as a function of μ 1 μ 2 .
Figure 15. Variations of relative entropy when σ 1 = σ 2 = 1 : (a) three dimensional surface as a function of μ 1 and μ 2 (b) as a function of μ 1 μ 2 .
Preprints 74672 g015
- If μ 2 = 0 , D K L ( f | | g ) = 1 2 μ 1 2 log 2 e , it is a function of μ 1 concave upward.
- If μ 1 = 0 , D K L ( f | | g ) = 1 2 μ 2 2 log 2 e , it is a function of μ 2 concave upward.
(3) If μ 1 = μ 2 , D K L ( f | | g ) = 1 2 ln σ 2 2 σ 1 2 + σ 1 2 σ 2 2 1 log 2 e
Figure 17. Relative entropy as a function of σ 1 and σ 2 when μ 1 = μ 2 : (a) the three dimensional surface; (b) contour with entropy gradient.
Figure 17. Relative entropy as a function of σ 1 and σ 2 when μ 1 = μ 2 : (a) the three dimensional surface; (b) contour with entropy gradient.
Preprints 74672 g016
- When σ 2 = 1 , D K L ( f | | g ) = 1 2 ln 1 σ 1 2 + σ 1 2 1 log 2 e
- When σ 1 = 1 , D K L ( f | | g ) = 1 2 ln σ 2 2 + 1 σ 2 2 1 log 2 e
Figure 18. Variations of relative entropy as a function of (a) σ 1 when fixed σ 2 = 1 and (b) σ 2 when fixed σ 1 = 1 , respectively ( μ 1 = μ 2 ).
Figure 18. Variations of relative entropy as a function of (a) σ 1 when fixed σ 2 = 1 and (b) σ 2 when fixed σ 1 = 1 , respectively ( μ 1 = μ 2 ).
Preprints 74672 g017
Sensitivity analysis of the relative entropy due to change of variances and means. The gradient of D K L ( f | | g ) given by
D K L ( σ 1 , σ 2 , μ 1 , μ 2 ) x = D K L σ 1 D K L σ 2 D K L μ 1 D K L μ 2
can be calculated where the calculation deals with partial derivatives where the cahin rule is involved. Based on the relation d d x ln x = 1 x , we have
σ 1 ln σ 2 2 σ 1 2 = σ 1 2 σ 2 2 ( 2 ) σ 2 2 σ 1 3 = 2 σ 1
and the following derivatives are obtained.
(1) D K L σ 1 = σ 1 ln σ 2 2 σ 1 2 + σ 1 2 σ 2 2 1 2 log 2 e = σ 1 σ 2 2 1 σ 1 log 2 e (2) D K L σ 2 = σ 2 ln σ 2 2 σ 1 2 + σ 1 2 σ 2 2 + ( μ 1 μ 2 ) 2 σ 2 2 1 2 log 2 e = 1 σ 2 σ 1 2 σ 2 3 ( μ 1 μ 2 ) 2 σ 2 3 log 2 e (3) D K L μ 1 = μ 1 μ 1 μ 2 σ 2 2 1 2 log 2 e = μ 1 μ 2 σ 2 2 log 2 e (4) D K L μ 2 = μ 2 μ 1 μ 2 σ 2 2 1 2 log 2 e = μ 2 μ 1 σ 2 2 log 2 e For optimality for each of the above cases, we have
D K L σ 1 = σ 1 σ 2 2 1 σ 1 = 0   when   σ 1 2 = σ 2 2
D K L σ 2 = 1 σ 2 σ 1 2 σ 2 3 ( μ 1 μ 2 ) 2 σ 2 3 = 0   when   σ 2 2 = σ 1 2 + ( μ 1 μ 2 ) 2
μ 1 μ 2 σ 2 2 log 2 e = 0   when   μ 1 = μ 2
μ 2 μ 1 σ 2 2 log 2 e = 0   when   μ 1 = μ 2

5. Mutual Information

Mutual information is one of many quantities that measures how much one random variables tells us about another. It is a dimensionless quantity with (generally) units of bits, and can be thought of as the reduction in uncertainty about one random variable given knowledge of another. The mutual information I ( X ; Y ) between two variables with joint pdf f X Y ( x , y ) is given by
I ( X ; Y ) = E log f X Y ( x , y ) f X ( x ) f Y ( y ) = f X Y ( x , y ) log f X Y ( x , y ) f X ( x ) f Y ( y ) d x d y
The mutual information between the random variables X and Y has the following relation
I ( X ; Y ) = I ( Y ; X )
where
I ( X ; Y ) = h ( X ) h ( X | Y ) 0
and
I ( Y ; X ) = h ( Y ) h ( Y | X ) 0
implying that h ( X ) h ( X | Y ) and h ( Y ) h ( Y | X ) . The mutual information of a random variable with itself is the self information, which is the entropy. High mutual information indicates a large reduction in uncertainty; low mutual information indicates a small reduction; and zero mutual information between two random variables, I ( X ; Y ) = 0 , meaning that the variables are independent. In such case, h ( X ) = h ( X | Y ) and h ( Y ) = h ( Y | X ) .
Let’s consider the mutual information between the correlated Gaussian variables X and Y given by
I ( X ; Y ) = h ( X ) + h ( Y ) h ( X , Y ) = 1 2 log 2 ( 2 π e ) σ x 2 + 1 2 log 2 ( 2 π e ) σ y 2 1 2 log 2 ( 2 π e ) 2 σ x 2 σ y 2 ( 1 ρ 2 ) = 1 2 log 2 ( 1 ρ 2 )
Figure 19 presents the mutual information versus ρ 2 , where it grows first much slower and then very fast for high values of ρ 2 . If ρ = ± 1 , the random variables X and Y are perfectly correlated, the mutually information is infinite. It can be ceen that I ( X ; Y ) = 0 for ρ = 0 and that I ( X ; Y ) for ρ ± 1 .
On the other hand, consider the additive white Gaussian noise (AWGN) channel shown as in Figure 20, the mutual information is given by
I ( X ; Y ) = h ( Y ) h ( Y | X ) = 1 2 log 2 2 π e ( σ x 2 + σ n 2 ) 2 π e σ n 2 = 1 2 log 2 1 + σ x 2 σ n 2
where
h ( Y | X ) = h ( N ) = h ( X , Y ) h ( X ) ,
and
h ( Y ) = 1 2 log 2 2 π e ( σ x 2 + σ n 2 ) ;   h ( Y | X ) = h ( N ) = 1 2 log 2 ( 2 π e ) σ n 2
Mutual information for the additive white Gaussian noise (AWGN) channel is shown in Figure 21, including the three-dimensional surface as a function of σ x 2 and σ n 2 ;, and also in terms of the the signal-to-noise-ratio SNR σ x 2 / σ n 2 . It can be seen that The mutual information grows first very fast and then much slower for high values of the signal-to-noise ratio.

6. Conclusions

This paper intends to serve to the readers as a supplement note on the geometric interpretation of the multivariate Gaussian distribution and its entropy, relative entropy, and mutual information. The illustrative examples are employed to provide further insights into the geometric interpretation of the multivariate Gaussian distribution and its entropy and mutual information, enabling the readers to correctly interpret the theory for future design. The fundamental objective is to study the application of multivariate sets of data in Gaussian distribution. This paper examines broad measurements of structure for Gaussian distributions, which shows that they can be described in terms of the information-theoretic between the given covariance matrix and correlated random variables (in terms of relative entropy). To develop the multivariate Gaussian distribution with the entropy and mutual information, several significant methodologies are presented through the discussion supported by illustrations, both technically and statistically. The content obtained allows readers to better perceive concepts, comprehend techniques, and properly execute software programs for future study on the topic’s science and implementations. It also helps readers grasp the themes’ fundamental concepts. Involving the relative entropy and mutual information as well as the potential correlated covariance analysis based on differential equations, a wide range of information is addressed, including basic to application concerns.

Author Contributions

Conceptualization, D.-J.J.; methodology, D.-J.J.; software, D.-J.J.; validation, D.-J.J. and T.-S.C.; writing—original draft preparation, D.-J.J. and T.-S.C.; writing—review and editing, D.-J.J., T.-S.C. and A. B.; supervision, D.-J.J. All authors have read and agreed to the published version of the manuscript.

Funding

The author gratefully acknowledges the support of the National Science and Technology Council, Taiwan under grant number NSTC 111-2221-E-019-047.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Derivation of the differential entropy for the univariate Gaussian distribution

h ( X ) = E [ log 2 f X ( x ) ] = f X ( x ) log 2 f X ( x ) d x = f X ( x ) log 2 1 2 π σ e ( x μ ) 2 2 σ 2 d x = f X ( x ) log 2 ( 2 π σ 2 ) 1 2 + log 2 e ( x μ ) 2 2 σ 2 d x = f X ( x ) 1 2 log 2 ( 2 π σ 2 ) ( x μ ) 2 2 σ 2 log 2 e d x = 1 2 log 2 ( 2 π σ 2 ) f X ( x ) d x + log 2 e 2 σ 2 ( x μ ) 2 f X ( x ) d x = 1 2 log 2 ( 2 π σ 2 ) + σ 2 2 σ 2 log 2 e = 1 2 log 2 ( 2 π e σ 2 )

Appendix B. Derivation of the differential entropy for the multivariate Gaussian distribution

h ( X ) = E log 2 f X ( x ) = E log 2 1 ( 2 π ) n | | e 1 2 ( x μ ) T 1 ( x μ ) = E n 2 log 2 ( 2 π ) 1 2 log 2 | | log 2 e 1 2 ( x μ ) T 1 ( x μ ) = f X ( x ) 1 2 log 2 ( 2 π ) n | | ) log 2 e 2 ( ( x μ ) T 1 ( x μ ) ) d x = 1 2 log 2 ( ( 2 π ) n | | ) + n 2 log 2 e = n 2 log 2 ( 2 π ) + 1 2 log 2 | | + log 2 e 1 2 ( x μ ) T 1 ( x μ ) = n 2 log 2 ( 2 π ) + 1 2 log 2 | | + log 2 e n 2 = n 2 log 2 ( 2 π ) + 1 2 log 2 | | + n 2 log 2 e = 1 2 log 2 ( 2 π e ) n | |
The calculation involves the evaluation of expectation of the Mahalanobis distance.

Appendix C. Evaluation of expectation of the Mahalanobis distance E [ ( x μ ) T 1 ( x μ ) ] = n

E [ ( x μ ) T 1 ( x μ ) ] = E [ t r ( ( x μ ) T 1 ( x μ ) ) ] = E [ t r ( 1 ( x μ ) T ( x μ ) ) ] = t r ( 1 E [ ( x μ ) T ( x μ ) ] ) = t r ( 1 ) = t r ( I n ) = n
A special case for n = 1
E [ ( x μ ) T 1 ( x μ ) ] = E ( x μ ) 2 σ 2 = f X ( x ) ( x μ ) 2 σ 2 d x = 1 σ 2 ( x μ ) 2 f X ( x ) d x = 1

Appendix D. Derivation of the differential entropy in the transformed frame

h ( Y ) = E log 2 i = 1 n 1 2 π λ i e 1 2 y i 2 λ i = E i = 1 n log 2 1 2 π λ i e 1 2 y i 2 λ i = i = 1 n E log 2 1 2 π λ i e 1 2 y i 2 λ i = i = 1 n E log 2 ( 1 2 π λ i ) + log 2 e 1 2 y i 2 λ i = i = 1 n f Y ( y i ) log 2 ( 1 2 π λ i ) + log 2 e 1 2 y i 2 λ i d y i = i = 1 n f Y ( y i ) 1 2 log 2 ( 2 π λ i ) 1 2 y i 2 λ i log 2 e d y i = i = 1 n 1 2 log 2 ( 2 π λ i ) + 1 2 log 2 e = i = 1 n 1 2 log 2 ( 2 π ) + 1 2 log 2 ( λ i ) + 1 2 log 2 e = n 2 log 2 ( 2 π e ) + 1 2 i = 1 n log 2 ( λ i )
The eigenvalues λ i are the diagonal elements of the covariance matrix, namely variances, in the transformed frame. When ρ = 0 , the eigenvectors are equal to λ i = σ i 2 .

Appendix E. Derivation of the Kullback–Leibler divergence between two normal distributions

D K L ( f | | g ) = f X ( x ) log 2 f X ( x ) g X ( x ) d x = f X ( x ) log 2 1 2 π σ 1 e 1 2 x μ 1 σ 1 2 1 2 π σ 2 e 1 2 x μ 2 σ 2 2 d x = f X ( x ) log 2 σ 2 σ 1 d x + f X ( x ) log 2 exp 1 2 x μ 1 σ 1 2 + 1 2 x μ 2 σ 2 2 d x = log 2 σ 2 σ 1 log 2 e 2 σ 1 2 f X ( x ) x μ 1 2 d x + log 2 e 2 σ 2 2 f X ( x ) x μ 2 2 d x = log 2 σ 2 σ 1 log 2 e 2 + log 2 e 2 σ 2 2 f X ( x ) x μ 1 + μ 1 μ 2 2 d x = log 2 σ 2 σ 1 log 2 e 2 + log 2 e 2 σ 2 2 f X ( x ) x μ 1 ) 2 + ( μ 1 μ 2 ) 2 + 2 ( x μ 1 ) ( μ 1 μ 2 ) d x = 1 2 log 2 σ 2 2 σ 1 2 log 2 e 2 + log 2 e 2 σ 2 2 σ 1 2 + ( μ 1 μ 2 ) 2 = 1 2 ln σ 2 2 σ 1 2 + σ 1 2 σ 2 2 + μ 1 μ 2 σ 2 2 1 log 2 e
where the equality log 2 ( ) = log 2 e ln ( ) was used.

References

  1. Verdú, S. (1990). On channel capacity per unit cost, IEEE Trans. Inf. Theory, vol. 36, no. 5, pp. 1019–1030. [CrossRef]
  2. Lapidoth, A. and Shamai (Shitz), S. (2002). Fading channels: How perfect need perfect side information be? IEEE Trans. Inf. Theory, vol. 48, no. 5, pp. 1118–1134.
  3. Verdú, S. (2002). Spectral efficiency in the wideband regime, IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1319–1343. [CrossRef]
  4. Prelov, V. and Verdú, S. (2004). Second-order asymptotics of mutual information, IEEE Trans. Inf. Theory, vol. 50, no. 8, pp. 1567–1580. [CrossRef]
  5. Kailath, T. (1968). A note on least squares estimates from likelihood ratios, Inf. Contr., vol. 13, pp. 534–540. [CrossRef]
  6. Kailath, T. (1969). A general likelihood-ratio formula for random signals in Gaussian noise, IEEE Trans. Inf. Theory, vol. IT-15, no. 2, pp. 350–361. [CrossRef]
  7. Kailath, T. (1970). A further note on a general likelihood formula for random signals in Gaussian noise, IEEE Trans. Inf. Theory, vol. IT-16, no. 4, pp. 393–396. [CrossRef]
  8. Jaffer A. G. and Gupta S. C. (1972). On relations between detection and estimation of discrete time processes, Inf. Contr., vol. 20, pp. 46–54. [CrossRef]
  9. Jwo, D. J., Biswal, A. (2023). Implementation and Performance Analysis of Kalman Filters with Consistency Validation. Mathematics, 11, 521. [CrossRef]
  10. Duncan, T. E. (1970). On the calculation of mutual information, SIAM J. Applied Mathematics, vol. 19, pp. 215–220. [CrossRef]
  11. Kadota, T. T., Zakai, M. and Ziv, J. (1971). Mutual information of the white Gaussian channel with and without feedback, IEEE Trans. Inf. Theory, vol. IT-17, no. 4, pp. 368–371. [CrossRef]
  12. Amari, S. I. (2016). Information geometry and its applications, Vol. 194. Springer.
  13. Schneidman, E., Still, S., Berry, M. J. and Bialek, W. (2003). Network information and connected correlations. Physical review letters, 91(23), p.238701. [CrossRef]
  14. Timme, N., Alford, W., Flecker, B. and Beggs, J. M. (2014). Synergy, redundancy, and multivariate information measures: an experimentalist’s perspective. Journal of computational neuroscience, 36, pp.119-140.
  15. Liang, K. C. and Wang, X. (2008). Gene regulatory network reconstruction using conditional mutual information. EURASIP Journal on Bioinformatics and Systems Biology, 2008, pp.1-14. [CrossRef]
  16. Panzeri, S., Magri, C. and Logothetis, N. K. (2008). On the use of information theory for the analysis of the relationship between neural and imaging signals. Magnetic resonance imaging, 26(7), pp.1015-1025. [CrossRef]
  17. Katz, Y., Tunstrøm, K., Ioannou, C. C., Huepe, C. and Couzin, I. D. (2011). Inferring the structure and dynamics of interactions in schooling fish. Proceedings of the National Academy of Sciences, 108(46), pp.18720-18725. [CrossRef]
  18. Cutsuridis, V., Hussain, A. and Taylor, J. G. eds. (2011). Perception-action cycle: Models, architectures, and hardware. Springer Science & Business Media.
  19. Ay, N., Bernigau, H., Der, R. and Prokopenko, M. (2012). Information-driven self-organization: the dynamical system approach to autonomous robot behavior. Theory in Biosciences, 131, pp.161-179. [CrossRef]
  20. Rosas, F., Ntranos, V., Ellison, C. J., Pollin, S. and Verhelst, M. (2016). Understanding interdependency through complex information sharing. Entropy, 18(2), p.38. [CrossRef]
  21. Ince, R. A. (2017). The Partial Entropy Decomposition: Decomposing multivariate entropy and mutual information via pointwise common surprisal. arXiv preprint arXiv:1702.01591.
  22. Perrone, P. and Ay, N. (2016). Hierarchical quantification of synergy in channels. Frontiers in Robotics and AI, 2, p.35. [CrossRef]
  23. Bertschinger, N., Rauh, J., Olbrich, E., Jost, J. and Ay, N. (2014). Quantifying unique information. Entropy, 16(4), pp.2161-2183. [CrossRef]
  24. Harder, M., Salge, C. and Polani, D. (2013). Bivariate measure of redundant information. Physical Review E, 87(1), p.012130. [CrossRef]
  25. Rauh, J., Banerjee, P. K., Olbrich, E., Jost, J. and Bertschinger, N. (2017). On extractable shared information. Entropy, 19(7), p.328. [CrossRef]
  26. Ince, R. A. (2017). Measuring multivariate redundant information with pointwise common change in surprisal. Entropy, 19(7), p.318. [CrossRef]
  27. Chicharro, D. and Panzeri, S. (2017). Synergy and redundancy in dual decompositions of mutual information gain and information loss. Entropy, 19(2), p.71. [CrossRef]
Figure 1. Standard parametric represenatation of ellipse followed by de La Hire’s point construction.
Figure 1. Standard parametric represenatation of ellipse followed by de La Hire’s point construction.
Preprints 74672 g001
Figure 3. The position of ellipse with various correlation coefficient given by the angel of inclination, specify θ to obtain ρ , ρ = σ 12 / ( σ 1 σ 2 ) : (a) θ = 30 , ρ 0.55 ; (b) θ = 0 , ρ = 0 ; (c) θ = 150 , ρ 0.55 , respectively.
Figure 3. The position of ellipse with various correlation coefficient given by the angel of inclination, specify θ to obtain ρ , ρ = σ 12 / ( σ 1 σ 2 ) : (a) θ = 30 , ρ 0.55 ; (b) θ = 0 , ρ = 0 ; (c) θ = 150 , ρ 0.55 , respectively.
Preprints 74672 g003
Figure 4. The position of ellipse with various values of correlation constant given the angel of inclination, specify ρ to obtain θ : (a) ρ = 0.95 , θ = 45 ; (b) ρ = 0 , θ = 0 ; (c) ρ = 0.95 , θ = 135 , respectively.
Figure 4. The position of ellipse with various values of correlation constant given the angel of inclination, specify ρ to obtain θ : (a) ρ = 0.95 , θ = 45 ; (b) ρ = 0 , θ = 0 ; (c) ρ = 0.95 , θ = 135 , respectively.
Preprints 74672 g004
Figure 5. Equal variances σ 1 = σ 2 = σ for a fixed ρ = 0.5 : (a) σ = 2 (b) σ = 3 (c) σ = 4 (d) σ = 5 .
Figure 5. Equal variances σ 1 = σ 2 = σ for a fixed ρ = 0.5 : (a) σ = 2 (b) σ = 3 (c) σ = 4 (d) σ = 5 .
Preprints 74672 g005
Figure 6. Ellipses for (a) ρ = 0.5 with varying variances σ 1 = σ 2 = σ = 2 ~ 5 ; (b) equal variances σ 1 = σ 2 = 2 with varying ρ = 0 ; 0.5 ; 0.9 ; 0.99 .
Figure 6. Ellipses for (a) ρ = 0.5 with varying variances σ 1 = σ 2 = σ = 2 ~ 5 ; (b) equal variances σ 1 = σ 2 = 2 with varying ρ = 0 ; 0.5 ; 0.9 ; 0.99 .
Preprints 74672 g006
Figure 7. σ 1 > σ 2 , σ 1 / σ 2 increases σ 1 = 2 ~ 5 , σ 2 = 2 for a fixed ρ = 0.5 .
Figure 7. σ 1 > σ 2 , σ 1 / σ 2 increases σ 1 = 2 ~ 5 , σ 2 = 2 for a fixed ρ = 0.5 .
Preprints 74672 g007
Figure 8. Ellipses for a fixed correlation coefficient when σ 1 σ 2 for a fixed ρ = 0.5 : (a) σ 1 > σ 2 , σ 1 / σ 2 increases where σ 1 = 2 ~ 5 and σ 2 = 2 ; (b) σ 2 > σ 1 , σ 2 / σ 1 increases where σ 2 = 2 ~ 5 and σ 1 = 2 .
Figure 8. Ellipses for a fixed correlation coefficient when σ 1 σ 2 for a fixed ρ = 0.5 : (a) σ 1 > σ 2 , σ 1 / σ 2 increases where σ 1 = 2 ~ 5 and σ 2 = 2 ; (b) σ 2 > σ 1 , σ 2 / σ 1 increases where σ 2 = 2 ~ 5 and σ 1 = 2 .
Preprints 74672 g008
Figure 9. Variation of inclination angle as a function of σ 1 and σ 2 , for (a) ρ = 0.5 ; (b) ρ = 0 .
Figure 9. Variation of inclination angle as a function of σ 1 and σ 2 , for (a) ρ = 0.5 ; (b) ρ = 0 .
Preprints 74672 g009
Figure 10. ( σ 1 = 4 , σ 2 = 2 ) with (a) ρ = 0 , 0.5 , 0.9 , 0.99 ; (b) ρ = 0 , 0.5 , 0.9 , 0.99 as compared to σ 2 > σ 1 ( σ 1 = 2 , σ 2 = 4 ) with (c) ρ = 0 , 0.5 , 0.9 , 0.99 (d) ρ = 0 , 0.5 , 0.9 , 0.99 .
Figure 10. ( σ 1 = 4 , σ 2 = 2 ) with (a) ρ = 0 , 0.5 , 0.9 , 0.99 ; (b) ρ = 0 , 0.5 , 0.9 , 0.99 as compared to σ 2 > σ 1 ( σ 1 = 2 , σ 2 = 4 ) with (c) ρ = 0 , 0.5 , 0.9 , 0.99 (d) ρ = 0 , 0.5 , 0.9 , 0.99 .
Preprints 74672 g010
Figure 11. Comparsion of the ellipses for various (i) σ 1 = 2 , σ 2 = 4 ; (ii) σ 1 = 4 , σ 2 = 2 ; (iii) σ 1 = σ 2 = 2 ; (iv) σ 1 = σ 2 = 4 , while fived ρ = 0.5 .
Figure 11. Comparsion of the ellipses for various (i) σ 1 = 2 , σ 2 = 4 ; (ii) σ 1 = 4 , σ 2 = 2 ; (iii) σ 1 = σ 2 = 2 ; (iv) σ 1 = σ 2 = 4 , while fived ρ = 0.5 .
Preprints 74672 g011
Figure 12. The differential entropy as a function σ 2 for a univariate Gaussian variable.
Figure 12. The differential entropy as a function σ 2 for a univariate Gaussian variable.
Preprints 74672 g012
Figure 13. Differential entropy for the bivariate Gaussian distribution (a) as function of ρ 2 and σ 2 , (b) as function of ρ 2 when σ 1 = σ 2 = 1 .
Figure 13. Differential entropy for the bivariate Gaussian distribution (a) as function of ρ 2 and σ 2 , (b) as function of ρ 2 when σ 1 = σ 2 = 1 .
Preprints 74672 g013
Figure 19. Mutual information versus ρ 2 between the correlated Gaussian variables.
Figure 19. Mutual information versus ρ 2 between the correlated Gaussian variables.
Preprints 74672 g018
Figure 20. Schematic illustration of the additive white Gaussian noise (AWGN) channel.
Figure 20. Schematic illustration of the additive white Gaussian noise (AWGN) channel.
Preprints 74672 g019
Figure 21. Mutual information for the additive white Gaussian noise (AWGN) channel: (a) the three-dimensional surface as a function of σ x 2 and σ n 2 ; (b) in terms of the the signal-to-noise-ratio.
Figure 21. Mutual information for the additive white Gaussian noise (AWGN) channel: (a) the three-dimensional surface as a function of σ x 2 and σ n 2 ; (b) in terms of the the signal-to-noise-ratio.
Preprints 74672 g020
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated