1. Introduction
The concept of physical entropy is controversial since the times of Boltzmann and Gibbs. The following passage of an interview with Shannon, whose fundamental contribution to the understanding of entropy in the information-theoretic sense is today widely recognized also in physics [
1], can be found in [
2]:
My greatest concern was what to call it. I thought of calling it ’information,’ but the word was overly used, so I decided to call it ’uncertainty.’ When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ”You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.”
During the years, the number of different interpretations and definitions of entropy has grown, a recent collection of heterogeneous ”entropies” being reported in [
3].
In this fragmented and ambiguous context, one of the most debated points is the relationship between information and physical entropy. The controversial about the role of information in physics plunge its roots in the famous thought experiments of Maxwell and Szilard, and is still today object of discussion. From the one side, Landauer claims in [
4] that
Information is a physical entity.
From the other side, Maroney writes in his thesis [
5] that
The Szilard Engine is unsuccessful as a paradigm of the information-entropy link,
and Maroney and Timpson reiterate the same concept in [
6]:
rejecting the claims that information is physical provides a better basis for understanding the fertile relationship between information theory and physics.
Independently of the different opinions and arguments, it is a matter of fact that entropy is the key quantity that allows to derive all the thermodynamic macro-properties of many-particle systems from the statistical properties of the constituent microscopic particles.
One standard approach to the entropy in the classical (non-quantum) regime is based on the Gibbs entropy of microstates which, for systems at the equilibrium with the environment, is the Shannon entropy of the probability distribution resulting from entropy maximization under the constraints imposed on the system by the environment that surrounds it. The principle of maximization of entropy of microstates, often referred to as MaxEnt, dates back to the seminal paper of Jaynes [
7] and is inspired to Shannon’s information-theoretic view of entropy. Regrettably, the probability distribution of microstates does not account for indistinguishability of particles. At the same time, indistinguishability of particles is essential, therefore it is imperative to incorporate it into the framework. This is typically done by subtracting from the entropy of microstates the Gibbsian correction term
, where
N is the number of system’s particles.
The Gibbsian correction term enters the scene almost like a sudden intervention, akin to a
deus ex machina, but the subtraction of
generates a glaring contradiction: when the entropy of microstates is smaller than
, the difference between entropy of microstates and
becomes negative, as it actually happens to the Sackur-Tetrode entropy formula at low temperature-to-density ratio. Since the entropy of any random variable is guaranteed to be always non-negative, entropy of microstates minus
cannot be entropy. This pathology is not surprising because, after the subtraction of
, it is no more specified the random variable whose ”entropy” is calculated, so ”entropy” becomes a mere formula that is attributed to an unspecified random variable that represents an unspecified physical system. In this controversial situation, after more than 100 years from its introduction, the Gibbsian
is still today object of research and debate, see e.g. [
8,
9].
This paper recovers coherency of the picture by considering, in place of the classical setting, the more general quantum setting, where entropy is the von Neumann entropy of the mixed state of the system. The mixed state of a system is commonly represented by a density operator living in the Hilbert space obtained as the tensor product N times of the single-particle Hilbert space . Exactly as it happens with the entropy of microstates, the von Neumann entropy of this density operator is appropriate for systems of distinguishable particles, but it does not capture indistinguishability of particles. We observe that, to capture indistinguishability of particles of a bosonic system, i.e. a system whose state remains unchanged when particles’ indexes are permuted, an eigenbasis that conveniently represents its quantum state is that of the vectors of the occupancy numbers, giving rise to the so-called second quantization formalism. The consideration of indistinguishability of particles and, with it, of the occupancy numbers, leads us to define the bosonic density operator i.e. the density operator in system’s bosonic subspace. Due to the inherent capability of the bosonic density operator to capture indistinguishability of particles, its von Neuman entropy is the entropy of the system. This allows overcoming the standard approach to entropy based on distinguishable particles and resolves the puzzle of the Gibbsian , which, as shown in this paper, now enters the scene through the main entrance alongside other quantum correction terms ensuring non-negativity of entropy.
Our exploration of the quantum apprach is situated within the framework of contemporary quantum thermodynamics, whose origins can be attributed to [
10,
11]. The two cited papers, which have deeply influenced the successive literature, e.g. [
12,
13,
14,
15,
16], derive the mixed state of a system at the equilibrium with the environment by tracing out the environment from the universe, that, in the thermal case, is the union of system and heat bath that thermalizes the system. The analysis of [
10] and [
11] shows that, as the number of particles of the universe tends to infinity, the mixed state of the system is the same, at least in the weak sense, both if the universe is in a pure state or in a mixed state. Purity of the state of the universe makes unnecessary the introduction of a statistical ensemble of ”universes,” overcoming the subjectivism that is inherent in the Bayesian approach.
The outline of the paper is as follows. Section II introduces the bosonic Hilbert subspace and its eigenbasis. In Section III, we derive the bosonic density operator for the system under consideration by tracing out the environment from the universe, assuming that the universe is in a bosonic eigenstate. With this assumption, we show that the probability distribution that weights the projectors of the bosonic Hilbert subspace of the system is the multivariate hypergeometric distribution. As the number of bosons of the universe tends to infinity, the multivariate hypergeometric distribution converges to the multinomial distribution, which we identify as the canonical distribution of the occupancy numbers. Section IV shows that, if, as in the Bayesian approach, the universe is assumed to be in a mixed bosonic state, then the distribution of the occupancy numbers of the system is multinomial provided that the occupancy numbers of the universe are multinomially distributed too. Section V discusses the application of the mentioned probability distributions to the entropy of physical systems and places our work within the framework of quantum information theory, unveiling an engaging connection between Bayesianism and empiricism in physics. To illustrate the intrinsic capacity of the canonical bosonic density operator to capture indistinguishability of particles in systems at the equilibrium, Section VI shows that its von Neumann entropy fits the entropy of the ideal gas in a container. Section VII sketches future application of our approach to the Szilard engine. Finally, in Section VIII we draw the conclusions.
3. Empirical approach
We want to obtain the system’s bosonic density operator
, where
is the occupancy macrostate of the universe, tracing out the environment made by
bosons from the projector
of the universe made by
U bosons. The first step is to write
in the form of bosonic purification:
where
means windowing of vector
between
i and
j and it is understood that the set
(
) is empty when one or more entries of
(
) are negative. Observing that
and that the condition
forces
, we conclude that
where
is the operator that traces out the Hilbert space of the environment. If the set
(
) is empty, then one or more entries of
(
) are negative, in which case the multinomial coefficient
(
) is zero by definition. The fraction appearing in (2) is the multivariate hypergeometric distribution, which is the distribution of the occupancy numbers of colors in drawing without replacement
N balls out of an urn containing
U balls with color occupancy numbers
. In many textbooks and papers, the multivariate hypergeometric distribution is expressed by binomial coefficients as
where
is the probability of the random variable inside the round brackets and
in the subscript means that the probability distribution
depends on the vector of known parameters
.
In the thermodynamic limit the multivariate hypergeometric distribution converges to the multinomial distribution, i.e. the distribution of the occupancy numbers of colors in drawing with replacement
N times a ball out of an urn containing colored balls with relative frequency of color
c equal to
:
where (4) is Stirling’s formula, while, by regarding
as the result of a PVM operated on the universe, equation (5) is recognized to be the Law of Large Numbers (LLN) for the empirical one-particle distribution.
Concentration inequalities that bound the probability of deviations of the empirical probability from its expectation, that is the probability of occurrence of
non-typical bosonic eigenstates of the universe, can be found in [
20] for hypergeometrically distributed random variables, in [
21] for multinomially distributed random vectors. Paper [
22] demonstrates that the multinomial distribution is the maximum entropy distribution of the occupancy numbers with constrained one-particle distribution. See also [
23,
24] for the multinomial distribution in statistical mechanics.
By equiprobability of the disjoint microstates belonging to the same occupancy macrostate, we see that, in (3),
Clearly
which explicitly shows that, in the thermodynamic limit, microstates outcoming from the mentioned PVM becomes independent and identically distributed random variables. What happens is that the LLN cancels the dependencies inside the joint distribution of microstates induced by the constraints imposed on the colors of the
N balls by the occupancy numbers
of the universe in drawing without replacement. Furthermore, the probability distribution
also becomes independent of the specific result
of the PVM operated on the universe, in the sense that the empirical one-particle distribution
for
is the same
for almost every bosonic eigenstate of the universe, i.e., for the
typical bosonic eigenstates of the universe. The one-particle distribution and, with it, the product
, will depend on the physical constraints imposed on the universe. When the constraint is the temperature, it is widely recognized that
is the Boltzmann distribution, that can be found by Jaynes’ constrained maximization of entropy [
7].
Independency and identical distribution of the eigenstates of the individual bosons of the system lead to the following density operator in
that we claim to be the canonical density operator in
. Papers [
10] and [
11] claim that system’s density operator in
converges to the canonical density operator as
for almost every pure state of the universe, i.e. for the
typical pure states of the universe. For this reason convergence to the canonical state is called
canonical typicality in [
11]. Typicality, which can be intended in various senses, is often invoked in statistical mechanics. For instance, paper [
25] uses the properties of the information-theoretic typical set to characterize weak convergence of microstates to equiprobability in the context of classical statistical mechanics. We point out that the claim of [
10] and [
11] is compatible with our claim of convergence to the bosonic canonical state if we do not pretend that typical bosonic eigenstates are typical states, as illustrated in
Figure 1.
5. Quantum entropy and quantum information
Since
(
) is an eigendecomposition, its von Neumann entropy is equal to the Shannon entropy of the multivariate hypergeometric (multinomial) distribution. For this reason, in the following we make no distinction between the von Neumann entropy and the Shannon entropy, calling it simply entropy. It is worth emphasizing that here we completely skip the notion of phase space, leading to the
exact probability distribution of the quantum occupancy numbers, and, as a consequence, to the exact entropy. Conversely, the standard phase space approach inherently leads to approximations to entropy, that ask for improvement at low temperature/density ratio, see e.g. [
26], still remaining approximations.
The entropy of the multivariate hypergeometric distribution, which, owing to the distribution’s symmetry, is equal to the entropy of both the system and the environment after their separation, is
where
is the von Neuman entropy of the density operator inside the round brackets,
is the classical expectation computed over the probability distribution of the random variable inside the expectation, the base of the logarithm is Euler’s number, and the Boltzmann constant in front of the logarithm is omitted for brevity. The term
is the Gibbs entropy of microstates, i.e. the entropy of the system of distinguishable particles, while the term
is due to indistinguishability of particles which, in the average, prevents the access to
units of information. Since
is the conditional probability of microstates given the macrostate, its expectation is the conditional Shannon entropy of microstates given the macrostate, so
is the mutual information between microstates and macrostates. The term
in (7) was introduced by Gibbs to force compatibility between the non-quantized phase-space (differential) entropy of microstates and the physical entropy of systems of indistinguishable particles. We observe that, while the probability that two or more classical particles have the same position and momentum in the phase space is zero because position and momentum are dense variables, the probability that two or more quantum particles occupy the same quantum state is not zero. This non-zero probability is captured by the sum of expectations in (7). As the entropy of microstates becomes lower and lower, this sum becomes closer and closer to
, till becoming equal to
when all the particles occupy the ground state. This prevents system’s entropy to become negative also when the entropy of microstates becomes vanishingly small.
In the canonical case, the entropy of the bosonic density operator is the entropy of the multinomial distribution:
see [
27] and [
28] for the calculation of the above entropy, see also [
24] for approximations to the entropy of the multinomial distribution in the context of statistical mechanics. Equation (8) is Equation (11) of [
24], where the authors call the entropy of the distribution of the occupancy numbers
entropy fluctuations. Apart of certain exceptions, the authors of [
24] consider these ”entropy fluctuations” negligible compared to the ”entropy” of the system, failing to recognize that the entropy of the occupancy numbers
is the thermodynamic entropy of a system of indistinguishable particles.
The following inequalities sandwich the Boltzmann entropy
between the two terms in the right hand side of (8):
where, with some abuse of notation, the factorials of the real numbers in the denominator of
are intended as
where
is the Gamma function. The first inequality is (11.22) of [
19], the second inequality is obtained by applying the Jensen inequality
to the convex (upward) function
. In statistical mechanics it is standard to derive from Stirling’s formula an approximation between the two terms of (9).
Note that, if we pretend that entropy is a variable of state, then the probability distribution of the occupancy numbers must depend only on the state of the system. However, the multivariate hypergeometric distribution depends also on the state of the universe. The dependency becomes weaker and weaker as the number of particles of the universe tends to infinity, but it remains that this makes the empirical approach incompatible with the notion of entropy as variable of state. In conclusion, entropy can be a variable of state only if we accept the Bayesian approach.
We hereafter introduce the
empirical information, sketching a new engaging connection between the Bayesian and the empirical approaches to information in quantum measurement theory. Let us regard the PVM operated on the universe as a POVM operated on the system. The difference between the entropy of the multinomial Bayesian marginal and the expectation over the multinomial Bayesian prior
of the entropy of the multivariate hypergeometric Bayesian likelihood of the system is equal to the Holevo upper bound
above the accessible quantum information that any POVM can achieve [
29]:
Whichever is
U, in place of the above Bayesian information one could consistently consider the following empirical information, that does not need the definition of a prior:
where
is the Shannon entropy of the empirical multinomial distribution
of the occupancy numbers of the system,
In (10), the one-particle probability distribution
is the same for the multivariate hypergeometric distribution and for the empirical multinomial distribution, therefore the inequality is guaranteed by the maximum entropy property of the multinomial distribution demonstrated in [
22]. The difference (10) is the ”empirical information” brought by PVM operated on the universe about the system. As
, both
and
tend to
and both the Bayesian information and the empirical information tend to zero. If, after the POVM, a PVM is operated on the system, the total information brought by the two measurements is
in the Bayesian approach,
in the empirical approach. As
, the total empirical information becomes equal to the total Bayesian information.
6. Entropy of the ideal gas in a container
In the case of an ideal monoatomic gas in a cubic container of side
L, one particle of the gas is modelled as a quantum ”particle in a box” with three degrees of freedom, whose energy eigenvalues with aperiodic boundary conditions are
where
c consists of the three quantum numbers
,
m is the mass of the particle and
is the Planck constant. The one-particle Boltzmann distribution for a gas at the thermal equilibrium at temperature
T Kelvin degrees is
and the associated multinomial distribution of the occupancy numbers for a gas of
N particles is
where
J/K is the Boltzmann constant and
Z is the one-particle partition function:
When the temperature-to-density ratio is high, it becomes possible to employ two approximations. In the first one, the partition function is approximated to an integral, see eqn. 19.54 of [
30], leading to
In the second one, with the idea that the probability that two or more particles occupy the same state is negligible, the denominator of the multinomial coefficient is ignored and, for large number of particles, the logarithm of the numerator
is approximated to
, leading to
Plugging the two approximations in (8) one gets textbook Sackur-Tetrode formula:
Note that, as already mentioned in the introduction, the exact entropy of the multinomial distribution (8) is guaranteed to be non-negative, while the Sackur-Tetrode formula becomes negative at low temperature-to-density ratio, where the two mentioned approximations do not hold, see [
31,
32] for details.
7. Future work
We hereafter sketch the application of our proposed approach to the state of Szilard engine after the insertion of the piston, letting the complete analysis of the Szilard cycle to future work. In the case of
N particles, after the insertion of the piston the number
b of particles that are found in one of the two sub-container of volume
is a binomial random variable
, where
V is the total volume. The probability distribution of the occupancy numbers is
where
(
) is the probability distribution of the occupancy numbers of a gas with
b (
) particles in the sub-container of volume
(
). For
and
, the entropy after the insertion of the wall is
where the famous
of Landauer [
33] comes from the binary equiprobable random variable
b and
is the probability distribution of one particle in a box of volume
. We have evaluated the partition function of the Boltzmann distribution with the parametrization of [
34], that is mass of the particle
kg, temperature
K, and one-dimensional box of size
m. We obtain that the entropy of the single particle with one degree of freedom before the insertion of the piston is
in
units, while with size of the one-dimensional box equal to
m, that is, after the insertion of the piston, the entropy in
units is
, leading to the difference
, in excellent agreement with the entropy fall shown in Figure 3 of [
34], where the result is derived by the phase space approach.
8. Conclusion
Entropy is a macroscopic property of a physical system and, at the same time, it is a mathematical property of randomness. As such, it must be a property of the randomness of system’s macrostates. However, despite the necessity of introducing the somewhat ad hoc term as a kind of deus ex machina, from Boltzmann to the present day the ”entropy” of a system at the equilibrium is commonly intended as the entropy of microstates, be it the Gibbs entropy or, to some extents, the Boltzmann entropy . This author has come to the conclusion that the misunderstanding arises from the following two conceptual errors.
The first error consists in regarding
as the ”entropy of a macrostate.” Even an exceptionally deep author as Jaynes writes in [
35]:
To emphasize this, note that a ”thermodynamic state” denoted by defines a large class of microstates compatible with X. Boltzmann, Planck, and Einstein showed that we may interpret the entropy of a macrostate as , where is the phase volume occupied by all the microstates in the chosen reference class C. This misconception is pervasive in the entire statistical mechanics. The influential authors of [
36] attribute to Einstein the idea that an individual macrostate can have entropy:
In fact, already Einstein (1914, Eq. (4a)) argued that the entropy of a macro state should be proportional to the log of the ”number of elementary quantum states" compatible with that macro state... . But entropy is a property of a statistical ensemble, therefore one individual macrostate cannot have entropy. Moreover, if by entropy of the macrostate we mean the entropy of the microstates that constitute it, then we must acknowledge that this entropy has no physical significance because the
elementary quantum state compatible with that macro state are physically meaningless when particles are indistinguishable. This misconception is so widespread that can be found also in standard textbooks. We hereafter quote a passage from the introduction to chapter 16 of [
30]:
specification of a macrostate constitutes incomplete information. In absence of a formal definition of information, this statement risks to become misleading. Actually, we have shown that the entropy of the occupancy numbers
is the complete information about the system because the entropy of microstates belonging to the same macrostate, due to indistinguishability of particles, is not informative, as it is subtracted in (8) to the entropy of microstates.
The second error is the lack of consideration of the absolute randomness of macrostates. Navigating through the literature of statistical mechanics, you may come across passages like the following as the one of [
23] that is hereafter reported:
A crucial observation in statistical mechanics is that the distribution of all macrostate variables gets sharply peaked and narrow as system size N increases. ... In the limit the probability of measuring a macrostate becomes a Dirac delta... Clearly, the quoted statement is wrong, because the
absolute width of the probability distribution of the occupancy numbers, and, more generally, of any macroscopic observable as for instance system’s energy, becomes broader and broader as the number of particles grows, while, by the LLN, its
relative width, i.e. the width compared to the mean value, sharpens. Certainly, it is apparent that
which shows that the relative randomness, i.e. the randomness per particle, becomes vanishingly small as
. Meanwhile, the absolute randomness, represented by
, increases as
N becomes larger and larger.
In the end, the lack of formal specification of the physical role of microstates, together with the lack of consideration of the absolute randomness of macrostates, lead to the questionable belief that all the properties of a system of a large numbers of particles, including ”entropy,” depend on the macrostate , because it contains ”the overwhelming majority” of microstates. From this standpoint, the subtraction of is seen as a technical maneuver that becomes necessary to circumvent the challenge posed by the indistinguishability of particles. Note that the idea itself that one macrostate can contain the overwhelming majority of microstates is inherently questionable, because the ratio tends to zero whichever is as .
In conclusion, this paper made clear that, since the quantum state of a system of indistinguishable particles (bosons) is completely specified by the occupancy numbers of the quantum states allowed to system’s particles, the entropy the physical system is the Shannon entropy of the random occupancy numbers , which is obtained by subtracting the expectation of to the entropy of microstates. Recognizing that is the conditional Shannon entropy of microstates given the macrostate, we equivalently express the above concept by saying that the entropy of the physical system is equal to the mutual information between microstates and macrostates.