Kolchinsky [
21] introduced
and showed that this measure is equivalent to
[
15] and to
[
14], in the sense that the three of them achieve the same optimum value [
21]. The multivariate extension of
was proposed by Griffith and Koch [
15], defined as
which we present because it makes it clear what conditions are enforced upon the marginals. There is a relation between
and
whenever the sources
are singletons. In this case, and only in this case, does the set
involved in the computation of
has only one element:
. Since this distribution, as well as the original distribution
p, are both admissible points in
, we have that
, which implies that
. On the other hand, if there is at least one source
that is not a singleton, the measures are not trivially comparable. For example, suppose we wish to compute
. We know that the solution of
is a distribution
whose marginals
and
must coincide with the marginals under the original
p. However, in the computation of
, it may be the case that the solution
of
is not in the set
, involved in the computation of
, and it achieves a lower mutual information with
T. That is, it might be the case that
, for all
. In such a case, we would have
.
It is convenient to be able to upper-bound certain measures with other measures. For example, Gomes and Figueiredo [
25] (see that paper for the definitions of these measures) showed that for any source
,
However, we argue that the inability to draw such strong conclusions (or bounds) is a positive aspect of PID. This is because there are many different ways to define the information (be it redundant, unique, union, etc) that one wishes to capture. If one could trivially relate all measures, it would mean that it would be possible to know
a priori how those measures would behave. Consequently, this would imply the absence of variability/freedom in how to measure different information concepts, as those measures would capture, not equivalent, but similar types of information, as they would all be ordered. It is precisely because one cannot order different measures of information trivially that PID provides a rich and complex framework to distinguish different types of information, although we believe PID is still in its infancy.
James et al. [
16] introduced a measure of unique information, which we recall now. In the bivariate case –
i.e., consider
– let
q be the maximum entropy distribution that preserves the marginals
and
, and let
r be the maximum entropy distribution that preservers the marginals
,
, and
. Although there is no closed form for
r, which has to be computed using an iterative algorithm [
37], it may be shown that the solution for
q is
. This is the same distribution
q that we consider for the bivariate decomposition (
3). James et al. [
16] suggest defining unique information
as the least change (in sources-target mutual information) that involves the addition of the
marginal constraint, that is
and analogously for
. They show that their measure yields a nonnegative decomposition for the bivariate case. Since
, some algebra leads to
where
is the synergy resulting from the decomposition of James et al. [
16] in the bivariate case. Recall that our measure of synergy for the bivariate case is given by
The similarity is striking. Computing
for the bivariate distributions in
Table 9 shows that it coincides with the decomposition given by our measure, except for the AND distribution, where we obtained
and
. We could not obtain
for the RDNUNQXOR distribution because the algorithm that computes
r did not finish in the allotted time of 10 minutes. James et al. [
16] showed that, for whichever bivariate distribution,
, therefore for the bivariate case we have
. Unfortunately, the measure of unique information proposed by James et al. [
16], unlike the usual proposals of intersection or union information, does not allow for the computation of the partial information atoms in the complete redundancy lattice if
. The authors also comment that it is not clear if their measure satisfies monotonicity when
. Naturally, our measure is not the same as
, so it doesn’t retain the operational interpretation of unique information
being the least amount that influences
when the marginal constraint
is added to the resulting maximum entropy distributions. Given the form of
, one could define
and study its properties. Clearly, it does not satisfy the self-redundancy axiom, but we wonder if it could be adjusted so that it satisfies all of the proposed axioms. The
decomposition retains the operational interpretation of the original measure, but it is not clear whether this is true for
. For the latter case, the maximum entropy distributions that we wrote as
q and
r have different definitions [
16]. We leave this for future work.